date:20110109


[ 
https://issues.apache.org/jira/browse/LUCENE-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979274#action_12979274
 ] 

Simon Willnauer commented on LUCENE-2855:
-

+1 - just put your name after the description in the changes.txt 

 Contrib queryparser should not use CharSequence as Map key
 --

 Key: LUCENE-2855
 URL: https://issues.apache.org/jira/browse/LUCENE-2855
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/*
Affects Versions: 3.0.3
Reporter: Adriano Crestani
Assignee: Adriano Crestani
 Fix For: 3.0.4

 Attachments: lucene_2855_adriano_crestani_2011_01_08.patch


 Today, contrib query parser uses MapCharSequence,... in many different 
 places, which may lead to problems, since CharSequence interface does not 
 enforce the implementation of hashcode and equals methods. Today, it's 
 causing a problem with QueryTreeBuilder.setBuilder(CharSequence,QueryBuilder) 
 method, that does not works as expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

[
https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979276#action_12979276
]

Earwin Burrfoot commented on LUCENE-2840:
-

bq. But doesn't that mean that an app w/ rare queries but each query is massive
fails to use all available concurrency?
Yes. But that's not my case. And likely not someone else's.

I think if you want to be super-generic, it's better to defer exact threading
to the user, instead of doing a one-size-fits-all solution. Else you risk
conjuring another ConcurrentMergeScheduler.
While we're at it, we can throw in some sample implementation, which can
satisfy some of the users, but not everyone.

Multi-Threading in IndexSearcher (after removal of MultiSearcher and
ParallelMultiSearcher)
---

Key: LUCENE-2840
URL: https://issues.apache.org/jira/browse/LUCENE-2840
Project: Lucene - Java
Issue Type: Sub-task
Components: Search
Reporter: Uwe Schindler
Priority: Minor
Fix For: 4.0

Spin-off from parent issue:
{quote}
We should discuss about how many threads should be spawned. If you have an
index with many segments, even small ones, I think only the larger segments
should be separate threads, all others should be handled sequentially. So
maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then
only spawn maxThreads-1 threads for the bigger readers and then one
additional thread for the rest?
{quote}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

[
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979277#action_12979277
]

Earwin Burrfoot commented on LUCENE-2843:
-

And we're nearing a day when we keep the whole term dictionary in memory (as
Sphinx does for instance).
At that point a gazillion of term lookup-related hacks (like lookup cache)
become obsolete :)
Term dictionary itself can also be memory-mapped after this, instead of being
read and built from disk, which makes new segment opening
near-instantaneous.

Add variable-gap terms index impl.
--

Key: LUCENE-2843
URL: https://issues.apache.org/jira/browse/LUCENE-2843
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 4.0

Attachments: LUCENE-2843.patch, LUCENE-2843.patch

PrefixCodedTermsReader/Writer (used by all real core codecs) already
supports pluggable terms index impls.
The only impl we have now is FixedGapTermsIndexReader/Writer, which
picks every Nth (default 32) term and holds it in efficient packed
int/byte arrays in RAM. This is already an enormous improvement (RAM
reduction, init time) over 3.x.
This patch adds another impl, VariableGapTermsIndexReader/Writer,
which lets you specify an arbitrary IndexTermSelector to pick which
terms are indexed, and then uses an FST to hold the indexed terms.
This is typically even more memory efficient than packed int/byte
arrays, though, it does not support ord() so it's not quite a fair
comparison.
I had to relax the terms index plugin api for
PrefixCodedTermsReader/Writer to not assume that the terms index impl
supports ord.
I also did some cleanup of the FST/FSTEnum APIs and impls, and broke
out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor
when the FST is used as a terms index but seekCeil when it's holding
all terms in the index (ie which SimpleText uses FSTs for).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

2011-01-09 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979284#action_12979284
 ] 

Doron Cohen commented on LUCENE-2840:
-

Is it a possible that with this, searching a large optimized index (single 
segment) might be slower than searching an un-optimzed index of the same size, 
since the latter enjoys concurrency? If so, is it too wild for more than one 
thread to handle that single segment?

 Multi-Threading in IndexSearcher (after removal of MultiSearcher and 
 ParallelMultiSearcher)
 ---

 Key: LUCENE-2840
 URL: https://issues.apache.org/jira/browse/LUCENE-2840
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Search
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 4.0


 Spin-off from parent issue:
 {quote}
 We should discuss about how many threads should be spawned. If you have an 
 index with many segments, even small ones, I think only the larger segments 
 should be separate threads, all others should be handled sequentially. So 
 maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then 
 only spawn maxThreads-1 threads for the bigger readers and then one 
 additional thread for the rest?
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

[
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979292#action_12979292
]

Michael McCandless commented on LUCENE-2843:

In-memory terms dict would be great. I agree it'd fundamentally change how we
execute eg the automaton queries (suddenly we can just intersect against the
terms dict instead of doing the seek/next thing); FuzzyQuery might be a direct
search through the terms dict instead of first building the LevN DFA;
respelling similarly...

But, I suspect we'll always have to support the on-disk only option because
some apps seem to have an insane number of terms.

Add variable-gap terms index impl.
--

Attachments: LUCENE-2843.patch, LUCENE-2843.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

[
https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979293#action_12979293
]

Michael McCandless commented on LUCENE-2840:

bq. I think if you want to be super-generic, it's better to defer exact
threading to the user, instead of doing a one-size-fits-all solution. Else you
risk conjuring another ConcurrentMergeScheduler.

I think something like CMS (basically a custom ES w/ proper thread
prio/scheduling) will be necessary here.

Until Java can schedule threads the way an OS schedules processes we'll need to
emulate it ourselves.

You want long running queries (or, merges) to be gracefully down prioritized so
that new/fast queries (merges) finish quickly.

And you want searches (merges) to use the allowed concurrency fully.

Multi-Threading in IndexSearcher (after removal of MultiSearcher and
ParallelMultiSearcher)
---

Key: LUCENE-2840
URL: https://issues.apache.org/jira/browse/LUCENE-2840
Project: Lucene - Java
Issue Type: Sub-task
Components: Search
Reporter: Uwe Schindler
Priority: Minor
Fix For: 4.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity


[ 
https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979295#action_12979295
 ] 

Simon Willnauer commented on LUCENE-1260:
-

bq. For trunk, here is what i suggest:
I didn't follow the entire thread here but is it worth all the effort what 
robert is suggesting or should we simply land docvalues branch and make norms a 
DocValues field? The infrastructure is already there, its integrated into codec 
and gives users the freedom to use any Type they want. 

 Norm codec strategy in Similarity
 -

 Key: LUCENE-1260
 URL: https://issues.apache.org/jira/browse/LUCENE-1260
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.3.1
Reporter: Karl Wettin
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch, 
 Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt, 
 LUCENE-1260_defaultsim.patch


 The static span and resolution of the 8 bit norms codec might not fit with 
 all applications. 
 My use case requires that 100f-250f is discretized in 60 bags instead of the 
 default.. 10?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-Solr-tests-only-trunk - Build # 3570 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3570/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads

Error Message:
CheckIndex failed

Stack Trace:
java.lang.RuntimeException: CheckIndex failed
at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:87)
at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73)
at 
org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131)
at 
org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137)
at 
org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1049)




Build Log (for compile errors):
[...truncated 3068 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

[
https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979303#action_12979303
]

Robert Muir commented on LUCENE-1260:
-

bq. I didn't follow the entire thread here but is it worth all the effort what
robert is suggesting or should we simply land docvalues branch and make norms a
DocValues field? The infrastructure is already there, its integrated into codec
and gives users the freedom to use any Type they want.

Simon, the the problem is encode/decode is in Similarity (instead of somewhere
else).

So, you would have the same problem with DocValues!

Norm codec strategy in Similarity
-

Key: LUCENE-1260
URL: https://issues.apache.org/jira/browse/LUCENE-1260
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Affects Versions: 2.3.1
Reporter: Karl Wettin
Assignee: Michael McCandless
Fix For: 4.0

Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch,
Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt,
LUCENE-1260_defaultsim.patch

The static span and resolution of the 8 bit norms codec might not fit with
all applications.
My use case requires that 100f-250f is discretized in 60 bags instead of the
default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

[
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979305#action_12979305
]

Earwin Burrfoot commented on LUCENE-2843:
-

As I said, there's already a search server with strictly in-memory (in mmap
sense. it can theoretically be paged out) terms dict AND widespread adoption.
Their users somehow manage.

My guess is that's because people with insane number of terms store various
crap like unique timestamps as terms. With CSF (attributes in Sphinx lingo),
and some nice filters that can work over CSF, there's no longer any need to
stuff your timestamps in the same place you stuff your texts. That can be
reflected in documentation, and then, suddenly, we can drop on-disk only
support.

Add variable-gap terms index impl.
--

Attachments: LUCENE-2843.patch, LUCENE-2843.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

[
https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979306#action_12979306
]

Earwin Burrfoot commented on LUCENE-2840:
-

A lot of fork-join type frameworks don't even care. Even though scheduling
threads is something people supposedly use them for.
Why? I guess that's due to low yield/cost ratio.
You frequently quote progress, not perfection in relation to the code, but
why don't we apply this same principle to our threading guarantees?
I don't want to use allowed concurrency fully. That's not realistic. I want 85%
of it. That's already a huge leap ahead of single-threaded searches.

Multi-Threading in IndexSearcher (after removal of MultiSearcher and
ParallelMultiSearcher)
---

Key: LUCENE-2840
URL: https://issues.apache.org/jira/browse/LUCENE-2840
Project: Lucene - Java
Issue Type: Sub-task
Components: Search
Reporter: Uwe Schindler
Priority: Minor
Fix For: 4.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979308#action_12979308
]

Michael McCandless commented on LUCENE-2324:

{quote}
I think with B
we're saying even if the calling thread is bound to DWPT #1, if DWPT #2 is
greater in size and the aggregate RAM usage exceeds the max, using the calling
thread, we take DWPT #2 out of production, flush, and return it?
{quote}
Right -- the thread affinity has nothing to do with which thread gets to flush
which DWPT. Once flush is triggered, the thread doing the flushing is free to
flush any DWPT.

{quote}
Maybe we can simply throw out the DWPT
and put recycling byte[]s and/or pooling DWPTs back in later if it's necessary?
{quote}

OK let's start there and put back re-use only if we see a real perf issue?

bq. What I meant was the following situation: Suppose we have two DWPTs and
IW.commit() is called. The first DWPT finishes flushing successfully, is
returned to the pool and idle again. The second DWPT flush fails with an
aborting exception.

Hmm, tricky. I think I'd lean towards keeping segment 1. Discarding it would
be inconsistent w/ aborts hit during the flushed by RAM case? EG if seg 1
was flushed due to RAM usage, succeeds, and then later seg 2 is flushed due to
RAM usage, but aborts. In this case we would still keep seg 1?

I think aborting a flush should only lose the docs in that one DWPT (as it is
today).

Remember, a call to commit may succeed in flushing seg 1 to disk, and updating
the in-memory segment infos, but on hitting the aborting exc to seg 2, will
throw that to the caller, not having committed *any* change to the index.
Exceptions thrown during the prepareCommit (phase 1) part of commit mean
nothing is changed in the index.

Alternatively... we could abort the entire IW session (as eg we handle OOME
today) if ever an aborting exception was hit? This might be cleaner? But it's
really a nuke the world option which scares me. EG it could be a looong
indexing session (app doesn't call commit() until the end) and we could be
throwing away *alot* of progress.

Per thread DocumentsWriters that write their own private segments
-

Key: LUCENE-2324
URL: https://issues.apache.org/jira/browse/LUCENE-2324
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
Fix For: Realtime Branch

Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out

See LUCENE-2293 for motivation and more details.
I'm copying here Mike's summary he posted on 2293:
Change the approach for how we buffer in RAM to a more isolated
approach, whereby IW has N fully independent RAM segments
in-process and when a doc needs to be indexed it's added to one of
them. Each segment would also write its own doc stores and
normal segment merging (not the inefficient merge we now do on
flush) would merge them. This should be a good simplification in
the chain (eg maybe we can remove the *PerThread classes). The
segments can flush independently, letting us make much better
concurrent use of IO CPU.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

[
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979313#action_12979313
]

Robert Muir commented on LUCENE-2843:
-

bq. As I said, there's already a search server with strictly in-memory (in mmap
sense. it can theoretically be paged out) terms dict AND widespread adoption.
Their users somehow manage

I don't like the reasoning that, just because sphinx does it and their 'users
manage', that makes it ok.
sphinx also requires mysql, which only when started supporting *real* utf-8?!
(not that 3-byte crap they tried to pass off instead)

I don't think we should really be looking there for inspiration.

Add variable-gap terms index impl.
--

Attachments: LUCENE-2843.patch, LUCENE-2843.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

[
https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-2846:

Attachment: LUCENE-2846.patch

here's an initial patch hacked up by mike and I... also removed the
multireader norms method that
takes a byte[]+offset from IndexReader.

one oddity is that MultiNorms.norms() always returns a filled byte[] here for
non-atomic readers (never null).
But i think this is ok for MultiNorms, its not used in searching (only for
SlowMultiReaderWrapper etc)

i think somehow it would be good to have more tests that test doesnt have
field versus omits norms,
and also (likely not in this is issue) we should think about IR's norm-setting
methods.

I don't like that these use Similarity.getDefault(): it seems we could require
you to pass in the Sim for the float case.
I also don't like that we expose a public setNorm that takes a byte value
either!

Long-term we should look at pulling this norm-encoding stuff out of Sim... the
Sim should just be dealing with floats,
this encoding stuff belongs somewhere else.

omitTF is viral, but omitNorms is anti-viral.
-

Key: LUCENE-2846
URL: https://issues.apache.org/jira/browse/LUCENE-2846
Project: Lucene - Java
Issue Type: Improvement
Reporter: Robert Muir
Fix For: 4.0

Attachments: LUCENE-2846.patch

omitTF is viral. if you add document 1 with field foo as omitTF, then
document 2 has field foo without omitTF, they are both treated as omitTF.
but omitNorms is the opposite. if you have a million documents with field
foo with omitNorms, then you add just one document without omitting norms,
now you suddenly have a million 'real norms'.
I think it would be good for omitNorms to be viral too, just for consistency,
and also to prevent huge byte[]'s.
but another option is to make omitTF anti-viral, which is more schemaless i
guess.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity


[ 
https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979317#action_12979317
 ] 

Simon Willnauer commented on LUCENE-1260:
-

bq. So, you would have the same problem with DocValues!
hmm, not sure if I understand this correctly. how values are encoded / decoded 
depends on the DocValues implementation which can be customized since it is 
exposed via codec. That means that users of the API always operate on float and 
the encoding and decoding happens inside codec and per field. So encode/decode 
in Sim would be obsolet, right?

 Norm codec strategy in Similarity
 -

 Key: LUCENE-1260
 URL: https://issues.apache.org/jira/browse/LUCENE-1260
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.3.1
Reporter: Karl Wettin
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch, 
 Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt, 
 LUCENE-1260_defaultsim.patch


 The static span and resolution of the 8 bit norms codec might not fit with 
 all applications. 
 My use case requires that 100f-250f is discretized in 60 bags instead of the 
 default.. 10?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

[
https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979319#action_12979319
]

Robert Muir commented on LUCENE-1260:
-

{quote}
hmm, not sure if I understand this correctly. how values are encoded / decoded
depends on the DocValues implementation which can be customized since it is
exposed via codec. That means that users of the API always operate on float and
the encoding and decoding happens inside codec and per field. So encode/decode
in Sim would be obsolet, right?
{quote}

the issues remaining here involve mostly fake norms, for the omitNorms case
(also empty norms I think).
So, the stuff I listed must be fixed regardless, to clean up the fake norms
case, it does not matter if real norms are encoded with CSF or not.

Doing things like cleaning up how we deal with fake norms, and removing
Similarity.get/setDefault is completely unrelated to DocValues... its just
stuff we must fix.

As long as we have these statics like Similarity.get/setDefault, its not even
useful to think about things like flexible scoring or per-field SImilarity...!

Norm codec strategy in Similarity
-

Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch,
Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt,
LUCENE-1260_defaultsim.patch

The static span and resolution of the 8 bit norms codec might not fit with
all applications.
My use case requires that 100f-250f is discretized in 60 bags instead of the
default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity


[ 
https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979326#action_12979326
 ] 

Michael McCandless commented on LUCENE-1260:


I think we need to stop faking norms, independent of whether/when we cutover to 
CSF to store norms / index stats?

Ie the two issues are orthogonal (and both are important!).

 Norm codec strategy in Similarity
 -

 Key: LUCENE-1260
 URL: https://issues.apache.org/jira/browse/LUCENE-1260
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.3.1
Reporter: Karl Wettin
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch, 
 Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt, 
 LUCENE-1260_defaultsim.patch


 The static span and resolution of the 8 bit norms codec might not fit with 
 all applications. 
 My use case requires that 100f-250f is discretized in 60 bags instead of the 
 default.. 10?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

2011-01-09 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979328#action_12979328
 ] 

Yonik Seeley commented on LUCENE-1260:
--

bq. I think we need to stop faking norms, independent of whether/when we 
cutover to CSF to store norms / index stats? 

+1, it was only intended to be a short-term thing for back compat (see way back 
to LUCENE-448)

 Norm codec strategy in Similarity
 -

 Key: LUCENE-1260
 URL: https://issues.apache.org/jira/browse/LUCENE-1260
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.3.1
Reporter: Karl Wettin
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch, 
 Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt, 
 LUCENE-1260_defaultsim.patch


 The static span and resolution of the 8 bit norms codec might not fit with 
 all applications. 
 My use case requires that 100f-250f is discretized in 60 bags instead of the 
 default.. 10?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

[
https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979331#action_12979331
]

Robert Muir commented on LUCENE-2846:
-

an alternative to totally clear up the faking here that mike thought of:

If we can somehow differentiate between omitNorms (null), and 'doesnt have
field' (say, exception),
we wouldn't need to fake. In multinorms we could then safely return null if any
reader returns null,
but throw an exception if all readers throw an exception.

omitTF is viral, but omitNorms is anti-viral.
-

Key: LUCENE-2846
URL: https://issues.apache.org/jira/browse/LUCENE-2846
Project: Lucene - Java
Issue Type: Improvement
Reporter: Robert Muir
Fix For: 4.0

Attachments: LUCENE-2846.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

[
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979334#action_12979334
]

Michael McCandless commented on LUCENE-2843:

Yes doc values should cut back on these large term dicts.

But, I'm not a fan of pure disk-based terms dict. Expecting the OS to make
good decisions on what gets swapped out is risky -- Lucene is better informed
than the OS on which data structures are worth spending RAM on (norms, terms
index, field cache, del docs).

If indeed the terms dict (thanks to FSTs) becomes small enough to fit in RAM,
then we should load it into RAM (and do away w/ the terms index).

Add variable-gap terms index impl.
--

Attachments: LUCENE-2843.patch, LUCENE-2843.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)


[ 
https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979337#action_12979337
 ] 

Michael McCandless commented on LUCENE-2840:


bq. You frequently quote progress, not perfection in relation to the code, 
but why don't we apply this same principle to our threading guarantees?

Oh we should definitely apply progress not perfection here -- in fact we 
already are: for starters (today), we bind concurrency to segments (so eg an 
optimized index has no concurrency), and we just use an ES (punt this thread 
scheduling problem to the caller).  This is better than nothing, but not good 
enough -- we can do better.

There's another quote that applies here: big dreams, small steps.  My comment 
above is dreaming but when it comes time to actually get the real work done / 
making progress towards that dream, of course we take baby steps / progress not 
perfection.

Design discussions should start w/ the big dreams but then once you've got a 
rough sense of where you want to get to in the future you shift back to the 
baby steps you do today, in the direction of that future goal.

Maybe I should wrap my comments in /dream tags and /babysteps tags!

 Multi-Threading in IndexSearcher (after removal of MultiSearcher and 
 ParallelMultiSearcher)
 ---

 Key: LUCENE-2840
 URL: https://issues.apache.org/jira/browse/LUCENE-2840
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Search
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 4.0


 Spin-off from parent issue:
 {quote}
 We should discuss about how many threads should be spawned. If you have an 
 index with many segments, even small ones, I think only the larger segments 
 should be separate threads, all others should be handled sequentially. So 
 maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then 
 only spawn maxThreads-1 threads for the bigger readers and then one 
 additional thread for the rest?
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

2011-01-09 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979345#action_12979345
]

Yonik Seeley commented on LUCENE-2843:
--

bq. Their users somehow manage.

That neglects to count those who are not users because they could not manage
with the limitations ;-)

Anyway, being able to optionally keep the term dict in memory, per-field, if
it's below a certain limits (terms/memory or whatever) would be very cool!

Add variable-gap terms index impl.
--

Attachments: LUCENE-2843.patch, LUCENE-2843.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

[
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979346#action_12979346
]

Earwin Burrfoot commented on LUCENE-2843:
-

bq. I don't like the reasoning that, just because sphinx does it and their
'users manage', that makes it ok.
I'm in no way advocating it as an all-round better solution. It has it's
wrinkles just as anything else.
My reasoning is merely that alternative exists, and it is viable. As proven by
pretty high-profile users.
They have memory-resident term dictionary, and it works, I heard no complaints
regarding this ever.

bq. sphinx also requires mysql
Have you read anything at all? It has an integration ready, for the layman user
who just wants to stick a fulltext search into their little app, but it is in
no way reliant on it.
Sphinx is a direct alternative to Solr.

{quote}
But, I'm not a fan of pure disk-based terms dict. Expecting the OS to make good
decisions on what gets swapped out is risky - Lucene is better informed than
the OS on which data structures are worth spending RAM on (norms, terms index,
field cache, del docs).
If indeed the terms dict (thanks to FSTs) becomes small enough to fit in RAM,
then we should load it into RAM (and do away w/ the terms index).
{quote}
That's a bit delusional. If a system is forced to swap out, it'll swap your
explicitly managed RAM just as likely as memory-mapped files. I've seen this
countless times.
But then, you have a number of benefits - like sharing filesystem cache when
opening same file multiple times, offloading things from Java heap (which is
almost always a good thing), fastest load-into-memory times possible.

Sorry, if I sound offending at times, but, damn, there's a whole world of
simple and efficient code lying ahead in that direction :)

Add variable-gap terms index impl.
--

Attachments: LUCENE-2843.patch, LUCENE-2843.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

[
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979347#action_12979347
]

Robert Muir commented on LUCENE-2843:
-

bq. Have you read anything at all?

Nope, havent looked at their code... i think i stopped at the documentation
when i saw how they analyzed text!

bq. Sorry, if I sound offending at times, but, damn, there's a whole world of
simple and efficient code lying ahead in that direction

So where is the problem?

You can make your own all-on-disk impl, or all-in-ram impl and contribute it?
And you dont have to implement terms dict cache,
thats contained in the implementation?

My problem is that we shouldnt assume all users can fit all their terms in RAM.

I think its great to offer alternative impls that work all in ram, and maybe if
termsdict X where X is some configurable value,
even consider using these automatically in standardcodec... but i don't see any
benefit of 'forcing' this when we have this
whole flexible indexing thing!

Add variable-gap terms index impl.
--

Attachments: LUCENE-2843.patch, LUCENE-2843.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

2011-01-09 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979348#action_12979348
]

Yonik Seeley commented on LUCENE-2843:
--

bq. My reasoning is merely that alternative exists, and it is viable. As proven
by pretty high-profile users.

Actually, I sort of agree. I read the in memory too fast and didn't realize
you were talking about memory mapped.
There are other parts of sphinx that are kept directly in memory (not memory
mapped) and do limit it's single-node scalability too much IMO.
Unfortunately, Java has additional overhead wrt mmap, and you also can't do
some stuff that you could do in C. All this means is that trade-offs that made
sense for C/C++ solutions may or may not make sense for Java solutions.

Add variable-gap terms index impl.
--

Attachments: LUCENE-2843.patch, LUCENE-2843.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

[
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979353#action_12979353
]

Robert Muir commented on LUCENE-2843:
-

bq. Unfortunately, Java has additional overhead wrt mmap

Its not just that, you cant assume mmap even works (32-bit platform, even
some troubles on 64-bit windows).
Because this is a search engine library, not just a server on 64-bit linux
only, then we need to support
other situations like 32-bit users doing desktop search.

In other words, Test2BTerms in src/test should pass on my 32-bit windows
machine with whatever we default to.

Add variable-gap terms index impl.
--

Attachments: LUCENE-2843.patch, LUCENE-2843.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

[
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979366#action_12979366
]

Earwin Burrfoot commented on LUCENE-2843:
-

bq. Nope, havent looked at their code... i think i stopped at the documentation
when i saw how they analyzed text!
All my points are contained within their documentation. No need to look at the
code (it's as shady as Lucene's).
In the same manner, Lucene had crappy analyzis for years, until you've taken
hold of (unicode) police baton.
So let's not allow color differences between our analyzers affect our judgement
on other parts of ours : )

bq. In other words, Test2BTerms in src/test should pass on my 32-bit windows
machine with whatever we default to.
I'm questioning is there any legal, adequate reason to have that much terms.
I'm agreeing on mmap+32bit/mmap+windows point for reasonable amount of terms
though :/

A hybrid solution, with term-dict being loaded completely into memory (either
via mmap, or into arrays) on per-field basis, is probably best in the end,
however sad it may be.

Add variable-gap terms index impl.
--

Attachments: LUCENE-2843.patch, LUCENE-2843.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

[
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979372#action_12979372
]

Robert Muir commented on LUCENE-2843:
-

bq. A hybrid solution, with term-dict being loaded completely into memory
(either via mmap, or into arrays) on per-field basis, is probably best in the
end, however sad it may be.

Whats the sad part again? why does it bother you if there is another
alternative codec setup or terms dict implementation if you aren't using it?
Should we also only have RAMDirectory and MMapDirectory and its sad that we
have NIOFSDirectory?

Add variable-gap terms index impl.
--

Attachments: LUCENE-2843.patch, LUCENE-2843.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

[
https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-2846:

Attachment: LUCENE-2846.patch

here's an updated patch:
* The IR.setNorm(float) is also removed, forcing the user to use the correct
similarity versus us using the wrong one (the static)
* MultiNorms doesn't fake norms anymore, instead it handles the case of
non-existent field versus omitted norms.
* When a document doesnt have a field, its (undefined) norms are written as
zero bytes instead of Similarity.getDefault().encodeNorm(1f).
* All uses of Similarity.get/setDefault are now gone in lucene core, except for
in IndexSearcher and IndexWriterConfig.

omitTF is viral, but omitNorms is anti-viral.
-

Key: LUCENE-2846
URL: https://issues.apache.org/jira/browse/LUCENE-2846
Project: Lucene - Java
Issue Type: Improvement
Reporter: Robert Muir
Fix For: 4.0

Attachments: LUCENE-2846.patch, LUCENE-2846.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.


 [ 
https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2846:


Attachment: LUCENE-2846.patch

sorry i had a piece of backwards logic in MultiNorms.

of course all tests pass either way, which is why we need a good mixed-schema 
test (with RIW) 
for this issue before it can go in (no matter what we do)

 omitTF is viral, but omitNorms is anti-viral.
 -

 Key: LUCENE-2846
 URL: https://issues.apache.org/jira/browse/LUCENE-2846
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch


 omitTF is viral. if you add document 1 with field foo as omitTF, then 
 document 2 has field foo without omitTF, they are both treated as omitTF.
 but omitNorms is the opposite. if you have a million documents with field 
 foo with omitNorms, then you add just one document without omitting norms, 
 now you suddenly have a million 'real norms'.
 I think it would be good for omitNorms to be viral too, just for consistency, 
 and also to prevent huge byte[]'s.
 but another option is to make omitTF anti-viral, which is more schemaless i 
 guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-09 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979382#action_12979382
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

bq. Once flush is triggered, the thread doing the flushing is free to flush any 
DWPT.

OK.

bq. OK let's start there and put back re-use only if we see a real perf issue?

I think that's best.  Balancing RAM isn't implemented in the branch, we can't 
predict the future usage of DWPT(s) (which could languish consuming RAM with 
byte[]s well after they're flushed due to a sudden drop in the number of 
calling threads external to IW).

{quote}But it's really a nuke the world option which scares me. EG it could 
be a looong indexing session (app doesn't call commit() until the end) and we 
could be throwing away alot of progress.{quote}

Right.  Another option is to on commit try to flush all segments, meaning even 
if one DWPT/segment aborts, continue on with the other DWPTs (ie, a best 
effort).  Then perhaps throw an exception with a report of which segment 
flushes succeeded, or simply return a report object detailing what happened 
during commit (somewhat expert usage though).  Either way I think we need to 
give a few options to the user, then choose a default and see if it sticks.  
The default should probably be best effort.



 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2855) Contrib queryparser should not use CharSequence as Map key

2011-01-09 Thread Adriano Crestani (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adriano Crestani updated LUCENE-2855:
-

Attachment: lucene_2855_adriano_crestani_2011_01_09.patch

Thanks for pointing out the problems, here is the new patch

 Contrib queryparser should not use CharSequence as Map key
 --

 Key: LUCENE-2855
 URL: https://issues.apache.org/jira/browse/LUCENE-2855
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/*
Affects Versions: 3.0.3
Reporter: Adriano Crestani
Assignee: Adriano Crestani
 Fix For: 3.0.4

 Attachments: lucene_2855_adriano_crestani_2011_01_08.patch, 
 lucene_2855_adriano_crestani_2011_01_09.patch


 Today, contrib query parser uses MapCharSequence,... in many different 
 places, which may lead to problems, since CharSequence interface does not 
 enforce the implementation of hashcode and equals methods. Today, it's 
 causing a problem with QueryTreeBuilder.setBuilder(CharSequence,QueryBuilder) 
 method, that does not works as expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2611) IntelliJ IDEA and Eclipse setup

2011-01-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979390#action_12979390
 ] 

David Smiley commented on LUCENE-2611:
--

Steven,
  I don't know if another issue should be created but there's some extra 
additions to the IntelliJ setup that would be nice.  in vcs.xml, add this:
{code:xml}
component name=IssueNavigationConfiguration
option name=links
  list
IssueNavigationLink
  option name=issueRegexp value=[A-Z]+\-\d+ /
  option name=linkRegexp 
value=http://issues.apache.org/jira/browse/$0; /
/IssueNavigationLink
  /list
/option
  /component
{code}
And in workspace.xml, /project/compone...@name=ChangeListManager]/ add 
{code:xml}
ignored path=.idea/ /
ignored mask=*.iml /
{code}
And perhaps the copyright setup should be set up for ASL.


 IntelliJ IDEA and Eclipse setup
 ---

 Key: LUCENE-2611
 URL: https://issues.apache.org/jira/browse/LUCENE-2611
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2611-branch-3x-part2.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-part2.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test_2.patch


 Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming.
 The attached patches add a new top level directory {{dev-tools/}} with 
 sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, 
 as well as top-level ant targets named idea and eclipse that copy these 
 files into the proper locations.  This arrangement avoids the messiness 
 attendant to in-place project configuration files directly checked into 
 source control.
 The IDEA configuration includes modules for Lucene and Solr, each Lucene and 
 Solr contrib, and each analysis module.  A JUnit run configuration per module 
 is included.
 The Eclipse configuration includes a source entry for each 
 source/test/resource location and classpath setup: a library entry for each 
 jar.
 For IDEA, once {{ant idea}} has been run, the only configuration that must be 
 performed manually is configuring the project-level JDK.  For Eclipse, once 
 {{ant eclipse}} has been run, the user has to refresh the project 
 (right-click on the project and choose Refresh).
 If these patches is committed, Subversion svn:ignore properties should be 
 added/modified to ignore the destination IDEA and Eclipse configuration 
 locations.
 Iam Jambour has written up on the Lucene wiki a detailed set of instructions 
 for applying the 3.X branch patch for IDEA: 
 http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2186) First cut at column-stride fields (index values storage)

2011-01-09 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979395#action_12979395
 ] 

Jason Rutherglen commented on LUCENE-2186:
--

Out of curiosity, re: LUCENE-2312, are we planning on putting CSF into Lucene 
4.x?  What's left to be done?

 First cut at column-stride fields (index values storage)
 

 Key: LUCENE-2186
 URL: https://issues.apache.org/jira/browse/LUCENE-2186
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: CSF branch, 4.0

 Attachments: LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, 
 LUCENE-2186.patch, LUCENE-2186.patch, mem.py


 I created an initial basic impl for storing index values (ie
 column-stride value storage).  This is still a work in progress... but
 the approach looks compelling.  I'm posting my current status/patch
 here to get feedback/iterate, etc.
 The code is standalone now, and lives under new package
 oal.index.values (plus some util changes, refactorings) -- I have yet
 to integrate into Lucene so eg you can mark that a given Field's value
 should be stored into the index values, sorting will use these values
 instead of field cache, etc.
 It handles 3 types of values:
   * Six variants of byte[] per doc, all combinations of fixed vs
 variable length, and stored either straight (good for eg a
 title field), deref (good when many docs share the same value,
 but you won't do any sorting) or sorted.
   * Integers (variable bit precision used as necessary, ie this can
 store byte/short/int/long, and all precisions in between)
   * Floats (4 or 8 byte precision)
 String fields are stored as the UTF8 byte[].  This patch adds a
 BytesRef, which does the same thing as flex's TermRef (we should merge
 them).
 This patch also adds basic initial impl of PackedInts (LUCENE-1990);
 we can swap that out if/when we get a better impl.
 This storage is dense (like field cache), so it's appropriate when the
 field occurs in all/most docs.  It's just like field cache, except the
 reading API is a get() method invocation, per document.
 Next step is to do basic integration with Lucene, and then compare
 sort performance of this vs field cache.
 For the sort by String value case, I think RAM usage  GC load of
 this index values API should be much better than field caache, since
 it does not create object per document (instead shares big long[] and
 byte[] across all docs), and because the values are stored in RAM as
 their UTF8 bytes.
 There are abstract Writer/Reader classes.  The current reader impls
 are entirely RAM resident (like field cache), but the API is (I think)
 agnostic, ie, one could make an MMAP impl instead.
 I think this is the first baby step towards LUCENE-1231.  Ie, it
 cannot yet update values, and the reading API is fully random-access
 by docID (like field cache), not like a posting list, though I
 do think we should add an iterator() api (to return flex's DocsEnum)
 -- eg I think this would be a good way to track avg doc/field length
 for BM25/lnu.ltc scoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-Solr-tests-only-trunk - Build # 3586 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3586/

1 tests failed.
REGRESSION:  org.apache.solr.client.solrj.TestLBHttpSolrServer.testSimple

Error Message:
expected:3 but was:2

Stack Trace:
junit.framework.AssertionFailedError: expected:3 but was:2
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1049)
at 
org.apache.solr.client.solrj.TestLBHttpSolrServer.testSimple(TestLBHttpSolrServer.java:126)




Build Log (for compile errors):
[...truncated 8211 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-2839) Visibility of Scorer.score(Collector, int, int) is wrong

2011-01-09 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-2839:
-

Assignee: Uwe Schindler

 Visibility of Scorer.score(Collector, int, int) is wrong
 

 Key: LUCENE-2839
 URL: https://issues.apache.org/jira/browse/LUCENE-2839
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


 The method for scoring subsets in Scorer has wrong visibility, its marked 
 protected, but protected methods should not be called from other classes. 
 Protected methods are intended for methods that should be overridden by 
 subclasses and are called by (often) final methods of the same class. They 
 should never be called from foreign classes.
 This method is called from another class out-of-scope: BooleanScorer(2) - so 
 it must be public, but it's protected. This does not lead to a compiler error 
 because BS(2) is in same package, but may lead to problems if subclasses from 
 other packages override it. When implementing LUCENE-2838 I hit a trap, as I 
 thought tis method should only be called from the class or Scorer itsself, 
 but in fact its called from outside, leading to bugs, because I had not 
 overridden it. As ConstantScorer did not use it I have overridden it with 
 throw UOE and suddenly BooleanQuery was broken, which made it clear that it's 
 called from outside (which is not the intention of protected methods).
 We cannot fix this in 3.x, as it would break backwards for classes that 
 overwrite this method, but we can fix visibility in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2839) Visibility of Scorer.score(Collector, int, int) is wrong

2011-01-09 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2839:
--

Attachment: LUCENE-2839-3.x.patch
LUCENE-2839.patch

Here the patch for trunk and 3.x, will commit soon. In 3.x I simply added a 
note to Scorer's javadocs, that tells the user, that subclasses in user's code 
should declare the method as public to ease transition to 4.0.

 Visibility of Scorer.score(Collector, int, int) is wrong
 

 Key: LUCENE-2839
 URL: https://issues.apache.org/jira/browse/LUCENE-2839
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0

 Attachments: LUCENE-2839-3.x.patch, LUCENE-2839.patch


 The method for scoring subsets in Scorer has wrong visibility, its marked 
 protected, but protected methods should not be called from other classes. 
 Protected methods are intended for methods that should be overridden by 
 subclasses and are called by (often) final methods of the same class. They 
 should never be called from foreign classes.
 This method is called from another class out-of-scope: BooleanScorer(2) - so 
 it must be public, but it's protected. This does not lead to a compiler error 
 because BS(2) is in same package, but may lead to problems if subclasses from 
 other packages override it. When implementing LUCENE-2838 I hit a trap, as I 
 thought tis method should only be called from the class or Scorer itsself, 
 but in fact its called from outside, leading to bugs, because I had not 
 overridden it. As ConstantScorer did not use it I have overridden it with 
 throw UOE and suddenly BooleanQuery was broken, which made it clear that it's 
 called from outside (which is not the intention of protected methods).
 We cannot fix this in 3.x, as it would break backwards for classes that 
 overwrite this method, but we can fix visibility in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2839) Visibility of Scorer.score(Collector, int, int) is wrong

2011-01-09 Thread Uwe Schindler (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler resolved LUCENE-2839.
---

Resolution: Fixed

Committed trunk revision: 1057010,
Committed javadoc updates revision: 1057011

Visibility of Scorer.score(Collector, int, int) is wrong

Key: LUCENE-2839
URL: https://issues.apache.org/jira/browse/LUCENE-2839
Project: Lucene - Java
Issue Type: Bug
Components: Search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 4.0

Attachments: LUCENE-2839-3.x.patch, LUCENE-2839.patch

The method for scoring subsets in Scorer has wrong visibility, its marked
protected, but protected methods should not be called from other classes.
Protected methods are intended for methods that should be overridden by
subclasses and are called by (often) final methods of the same class. They
should never be called from foreign classes.
This method is called from another class out-of-scope: BooleanScorer(2) - so
it must be public, but it's protected. This does not lead to a compiler error
because BS(2) is in same package, but may lead to problems if subclasses from
other packages override it. When implementing LUCENE-2838 I hit a trap, as I
thought tis method should only be called from the class or Scorer itsself,
but in fact its called from outside, leading to bugs, because I had not
overridden it. As ConstantScorer did not use it I have overridden it with
throw UOE and suddenly BooleanQuery was broken, which made it clear that it's
called from outside (which is not the intention of protected methods).
We cannot fix this in 3.x, as it would break backwards for classes that
overwrite this method, but we can fix visibility in trunk.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2186) First cut at column-stride fields (index values storage)


[ 
https://issues.apache.org/jira/browse/LUCENE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979404#action_12979404
 ] 

Simon Willnauer commented on LUCENE-2186:
-

bq. Out of curiosity, re: LUCENE-2312, are we planning on putting CSF into 
Lucene 4.x? What's left to be done?
we are very close - to land on trunk there is about an evening of work left. 
JDoc is missing here and there plus some tests for FieldComparators - thats it!

 First cut at column-stride fields (index values storage)
 

 Key: LUCENE-2186
 URL: https://issues.apache.org/jira/browse/LUCENE-2186
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: CSF branch, 4.0

 Attachments: LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, 
 LUCENE-2186.patch, LUCENE-2186.patch, mem.py


 I created an initial basic impl for storing index values (ie
 column-stride value storage).  This is still a work in progress... but
 the approach looks compelling.  I'm posting my current status/patch
 here to get feedback/iterate, etc.
 The code is standalone now, and lives under new package
 oal.index.values (plus some util changes, refactorings) -- I have yet
 to integrate into Lucene so eg you can mark that a given Field's value
 should be stored into the index values, sorting will use these values
 instead of field cache, etc.
 It handles 3 types of values:
   * Six variants of byte[] per doc, all combinations of fixed vs
 variable length, and stored either straight (good for eg a
 title field), deref (good when many docs share the same value,
 but you won't do any sorting) or sorted.
   * Integers (variable bit precision used as necessary, ie this can
 store byte/short/int/long, and all precisions in between)
   * Floats (4 or 8 byte precision)
 String fields are stored as the UTF8 byte[].  This patch adds a
 BytesRef, which does the same thing as flex's TermRef (we should merge
 them).
 This patch also adds basic initial impl of PackedInts (LUCENE-1990);
 we can swap that out if/when we get a better impl.
 This storage is dense (like field cache), so it's appropriate when the
 field occurs in all/most docs.  It's just like field cache, except the
 reading API is a get() method invocation, per document.
 Next step is to do basic integration with Lucene, and then compare
 sort performance of this vs field cache.
 For the sort by String value case, I think RAM usage  GC load of
 this index values API should be much better than field caache, since
 it does not create object per document (instead shares big long[] and
 byte[] across all docs), and because the values are stored in RAM as
 their UTF8 bytes.
 There are abstract Writer/Reader classes.  The current reader impls
 are entirely RAM resident (like field cache), but the API is (I think)
 agnostic, ie, one could make an MMAP impl instead.
 I think this is the first baby step towards LUCENE-1231.  Ie, it
 cannot yet update values, and the reading API is fully random-access
 by docID (like field cache), not like a posting list, though I
 do think we should add an iterator() api (to return flex's DocsEnum)
 -- eg I think this would be a good way to track avg doc/field length
 for BM25/lnu.ltc scoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2611) IntelliJ IDEA and Eclipse setup

2011-01-09 Thread Steven Rowe (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979405#action_12979405
]

Steven Rowe commented on LUCENE-2611:
-

Hi David,

Thanks for the input.

I don't think another issue is necessary.

I added the {{.idea/vcs.xml}} change to auto-linkify issues in log comments. I
didn't know this option existed. Where does it do the auto-linkification? I
don't see it in the log comment editor, and I also don't see it when I use
browse an individual file's log messages (using the popup from the svnbar
plugin toolbar icon).

But I did not add the {{.idea/workspace.xml}} change you propose (ignoring
{{.idea/}} and {{.iml}} files), because those files are already ignored via
{{svn:ignore}} properties. When I added them, nothing changed for me - the
files still show up in the project tree view greyed out, just as they did
before I added the option.

I'm not sure it's a good idea to add copyright setup for ASL - I don't know
enough about what this plugin does.

IntelliJ IDEA and Eclipse setup
---

Key: LUCENE-2611
URL: https://issues.apache.org/jira/browse/LUCENE-2611
Project: Lucene - Java
Issue Type: New Feature
Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
Fix For: 3.1, 4.0

Attachments: LUCENE-2611-branch-3x-part2.patch,
LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch,
LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch,
LUCENE-2611-branch-3x.patch, LUCENE-2611-part2.patch, LUCENE-2611.patch,
LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch,
LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch,
LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch,
LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch,
LUCENE-2611_test_2.patch

Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming.
The attached patches add a new top level directory {{dev-tools/}} with
sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk,
as well as top-level ant targets named idea and eclipse that copy these
files into the proper locations. This arrangement avoids the messiness
attendant to in-place project configuration files directly checked into
source control.
The IDEA configuration includes modules for Lucene and Solr, each Lucene and
Solr contrib, and each analysis module. A JUnit run configuration per module
is included.
The Eclipse configuration includes a source entry for each
source/test/resource location and classpath setup: a library entry for each
jar.
For IDEA, once {{ant idea}} has been run, the only configuration that must be
performed manually is configuring the project-level JDK. For Eclipse, once
{{ant eclipse}} has been run, the user has to refresh the project
(right-click on the project and choose Refresh).
If these patches is committed, Subversion svn:ignore properties should be
added/modified to ignore the destination IDEA and Eclipse configuration
locations.
Iam Jambour has written up on the Lucene wiki a detailed set of instructions
for applying the 3.X branch patch for IDEA:
http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2186) First cut at column-stride fields (index values storage)

2011-01-09 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979407#action_12979407
 ] 

Jason Rutherglen commented on LUCENE-2186:
--

bq. we are very close - to land on trunk there is about an evening of work 
left. JDoc is missing here and there plus some tests for FieldComparators - 
thats it!

Nice!  Once it's in I'll try to get started on the RT field cache/doc values, 
which can likely be implemented and tested somewhat independent of the RT 
inverted index.

 First cut at column-stride fields (index values storage)
 

 Key: LUCENE-2186
 URL: https://issues.apache.org/jira/browse/LUCENE-2186
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: CSF branch, 4.0

 Attachments: LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, 
 LUCENE-2186.patch, LUCENE-2186.patch, mem.py


 I created an initial basic impl for storing index values (ie
 column-stride value storage).  This is still a work in progress... but
 the approach looks compelling.  I'm posting my current status/patch
 here to get feedback/iterate, etc.
 The code is standalone now, and lives under new package
 oal.index.values (plus some util changes, refactorings) -- I have yet
 to integrate into Lucene so eg you can mark that a given Field's value
 should be stored into the index values, sorting will use these values
 instead of field cache, etc.
 It handles 3 types of values:
   * Six variants of byte[] per doc, all combinations of fixed vs
 variable length, and stored either straight (good for eg a
 title field), deref (good when many docs share the same value,
 but you won't do any sorting) or sorted.
   * Integers (variable bit precision used as necessary, ie this can
 store byte/short/int/long, and all precisions in between)
   * Floats (4 or 8 byte precision)
 String fields are stored as the UTF8 byte[].  This patch adds a
 BytesRef, which does the same thing as flex's TermRef (we should merge
 them).
 This patch also adds basic initial impl of PackedInts (LUCENE-1990);
 we can swap that out if/when we get a better impl.
 This storage is dense (like field cache), so it's appropriate when the
 field occurs in all/most docs.  It's just like field cache, except the
 reading API is a get() method invocation, per document.
 Next step is to do basic integration with Lucene, and then compare
 sort performance of this vs field cache.
 For the sort by String value case, I think RAM usage  GC load of
 this index values API should be much better than field caache, since
 it does not create object per document (instead shares big long[] and
 byte[] across all docs), and because the values are stored in RAM as
 their UTF8 bytes.
 There are abstract Writer/Reader classes.  The current reader impls
 are entirely RAM resident (like field cache), but the API is (I think)
 agnostic, ie, one could make an MMAP impl instead.
 I think this is the first baby step towards LUCENE-1231.  Ie, it
 cannot yet update values, and the reading API is fully random-access
 by docID (like field cache), not like a posting list, though I
 do think we should add an iterator() api (to return flex's DocsEnum)
 -- eg I think this would be a good way to track avg doc/field length
 for BM25/lnu.ltc scoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-Solr-tests-only-trunk - Build # 3590 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3590/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads

Error Message:
CheckIndex failed

Stack Trace:
java.lang.RuntimeException: CheckIndex failed
at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:87)
at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73)
at 
org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131)
at 
org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137)
at 
org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1049)




Build Log (for compile errors):
[...truncated 3101 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2272) Join

2011-01-09 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-2272:
-

Component/s: search

 Join
 

 Key: SOLR-2272
 URL: https://issues.apache.org/jira/browse/SOLR-2272
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: SOLR-2272.patch


 Limited join functionality for Solr, mapping one set of IDs matching a query 
 to another set of IDs, based on the indexed tokens of the fields.
 Example:
 fq={!join  from=parent_ptr to:parent_id}child_doc:query

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2611) IntelliJ IDEA and Eclipse setup

2011-01-09 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979454#action_12979454
 ] 

Chris Male commented on LUCENE-2611:


.bq I'm not sure it's a good idea to add copyright setup for ASL - I don't know 
enough about what this plugin does.

I've used the copyright plugin a lot and its a great way to ensure that the ASL 
is added to any new files.  Might be useful to add it to reduce the hassle for 
new contributors.

 IntelliJ IDEA and Eclipse setup
 ---

 Key: LUCENE-2611
 URL: https://issues.apache.org/jira/browse/LUCENE-2611
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2611-branch-3x-part2.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-part2.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test_2.patch


 Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming.
 The attached patches add a new top level directory {{dev-tools/}} with 
 sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, 
 as well as top-level ant targets named idea and eclipse that copy these 
 files into the proper locations.  This arrangement avoids the messiness 
 attendant to in-place project configuration files directly checked into 
 source control.
 The IDEA configuration includes modules for Lucene and Solr, each Lucene and 
 Solr contrib, and each analysis module.  A JUnit run configuration per module 
 is included.
 The Eclipse configuration includes a source entry for each 
 source/test/resource location and classpath setup: a library entry for each 
 jar.
 For IDEA, once {{ant idea}} has been run, the only configuration that must be 
 performed manually is configuring the project-level JDK.  For Eclipse, once 
 {{ant eclipse}} has been run, the user has to refresh the project 
 (right-click on the project and choose Refresh).
 If these patches is committed, Subversion svn:ignore properties should be 
 added/modified to ignore the destination IDEA and Eclipse configuration 
 locations.
 Iam Jambour has written up on the Lucene wiki a detailed set of instructions 
 for applying the 3.X branch patch for IDEA: 
 http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2611) IntelliJ IDEA and Eclipse setup

2011-01-09 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979454#action_12979454
]

Chris Male edited comment on LUCENE-2611 at 1/9/11 8:51 PM:

bq. I'm not sure it's a good idea to add copyright setup for ASL - I don't know
enough about what this plugin does.

I've used the copyright plugin a lot and its a great way to ensure that the ASL
is added to any new files. Might be useful to add it to reduce the hassle for
new contributors.

was (Author: cmale):
.bq I'm not sure it's a good idea to add copyright setup for ASL - I don't
know enough about what this plugin does.

I've used the copyright plugin a lot and its a great way to ensure that the ASL
is added to any new files. Might be useful to add it to reduce the hassle for
new contributors.

IntelliJ IDEA and Eclipse setup
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-trunk - Build # 1421 - Failure