[jira] Commented: (SOLR-1316) Create autosuggest component

2010-06-19 Thread Andy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880589#action_12880589
 ] 

Andy commented on SOLR-1316:


Does this handle non-prefix matches?

For example, if user types "guit", I want to suggest both "guitar" and 
"electric guitar".

Would this patch do that?

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: Next
>
> Attachments: SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, 
> SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, suggest.patch, 
> suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Doppleganger threads after ingestion completed

2010-06-19 Thread Lance Norskog
"Chewing up cpu" or "blocked". The stack trace says it's blocked.

The sockets are abandoned by the program, yes, but TCP/IP itself has a
complex sequence for shutting down sockets that takes a few minutes.
If these sockets stay around for hours, then there's a real problem.
(In fact, there is a bug in the TCP/IP specification, 40 years old,
that causes zombie sockets that never shut down.)

The HTTP solr server really needs a socket close() method.

On Thu, Jun 17, 2010 at 6:08 AM,   wrote:
> Folks,
>
> I ran 20,000,000 records into Solr via the extractingUpdateRequestHandler
> under jetty.  The previous problems with resources have apparently been
> resolved by using Http1.1 with keep-alive, rather than creating and
> destroying 20,000,000 sockets. ;-)  However, after the client terminates, I
> still find the Solr process chewing away CPU – indeed, there were 5 threads
> doing this.
>
> A thread dump yields the following partial trace for all 5 threads:
>
> "btpool0-13" prio=10 tid=0x41391000 nid=0xe7c runnable
> [0x7f4a8c789000]
>    java.lang.Thread.State: RUNNABLE
>     at
> org.mortbay.jetty.HttpParser$Input.blockForContent(HttpParser.java:925)
>     at org.mortbay.jetty.HttpParser$Input.read(HttpParser.java:897)
>     at
> org.apache.commons.fileupload.MultipartStream$ItemInputStream.makeAvailable(MultipartStream.java:977)
>     at
> org.apache.commons.fileupload.MultipartStream$ItemInputStream.close(MultipartStream.java:924)
>     at
> org.apache.commons.fileupload.MultipartStream$ItemInputStream.close(MultipartStream.java:904)
>     at org.apache.commons.fileupload.util.Streams.copy(Streams.java:119)
>     at org.apache.commons.fileupload.util.Streams.copy(Streams.java:64)
>     at
> org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)
>     at
> org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
>     at
> org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:343)
>     at
> org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:396)
>     at
> org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:114)
>     at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
>     at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> …
>
> I could be wrong, but it looks to me like either jetty or fileupload may
> have a problem here.  I have not looked at the jetty source code, but
> infinitely spinning processes even after the socket has been abandoned do
> not seem reasonable to me.  Thoughts?
>
> Karl
>
>



-- 
Lance Norskog
goks...@gmail.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2504) sorting performance regression

2010-06-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880542#action_12880542
 ] 

Yonik Seeley commented on LUCENE-2504:
--

More numbers: Windows 7:
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)

f10_s sort only: 115 ms
sort against random field: 162 ms 

> sorting performance regression
> --
>
> Key: LUCENE-2504
> URL: https://issues.apache.org/jira/browse/LUCENE-2504
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Yonik Seeley
> Fix For: 4.0
>
>
> sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2504) sorting performance regression

2010-06-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880541#action_12880541
 ] 

Yonik Seeley commented on LUCENE-2504:
--

More numbers:  Ubuntu, Java 1.7.0-ea-b98 (64 bit):
f10_s sort only: 126 ms
sort against random field: 175 ms

> sorting performance regression
> --
>
> Key: LUCENE-2504
> URL: https://issues.apache.org/jira/browse/LUCENE-2504
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Yonik Seeley
> Fix For: 4.0
>
>
> sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2504) sorting performance regression

2010-06-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880540#action_12880540
 ] 

Yonik Seeley commented on LUCENE-2504:
--

My guess is that this is caused by LUCENE-2380, but I opened a separate issue 
since I'm not sure.
This is the same type of JVM performance issues reported by Mike in LUCENE-2143 
and myself in LUCENE-2380.

Setup:
  Same test index I used to test faceting: 10M doc index with 5 fields:
   -  f10_s:  a single valued string field with 100,000 unique values 
   -  f1_s:   a single valued field with 10,000 unique values
   -  f1000_s:   a single valued field with 1000 unique values
   -  f100_s:   a single valued field with 100 unique values
   -  f10_s:   a single valued field with 10 unique values

URLs I tested against Solr are of the form:
http://localhost:8983/solr/select?q=*:*&rows=1&sort=f10_s+asc

branch_3x
--
 f10_s sort only: 101 ms
sort against random field: 101 ms

trunk:
--
 f10_s sort only: 111 ms
sort against random field: 158 ms

This is not due to garbage collection or cache effects.  After you sort against 
a mix of fields, the performance is worse forever... you can go back to sorting 
against  f10_s only, and the performance never recovers.

System: Ubuntu on Phenom II 4x3.0GHz, Java 1.6_20

So my guess is that this is caused by the ord lookup going through PagedBytes, 
and the JVM not optimizing away the indirection when there is a mix of 
implementations.


> sorting performance regression
> --
>
> Key: LUCENE-2504
> URL: https://issues.apache.org/jira/browse/LUCENE-2504
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Yonik Seeley
> Fix For: 4.0
>
>
> sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2504) sorting performance regression

2010-06-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880539#action_12880539
 ] 

Michael McCandless commented on LUCENE-2504:


I'll dig.

> sorting performance regression
> --
>
> Key: LUCENE-2504
> URL: https://issues.apache.org/jira/browse/LUCENE-2504
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Yonik Seeley
> Fix For: 4.0
>
>
> sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2504) sorting performance regression

2010-06-19 Thread Yonik Seeley (JIRA)
sorting performance regression
--

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0


sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1965) Solr 4.0 performance improvements

2010-06-19 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1965:
---

Attachment: SOLR-1965.patch

Here's the patch for facet.method=fc (single valued) that uses the latest patch 
in LUCENE-2378 to fix the performance regression.

> Solr 4.0 performance improvements
> -
>
> Key: SOLR-1965
> URL: https://issues.apache.org/jira/browse/SOLR-1965
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Fix For: 4.0
>
> Attachments: SOLR-1965.patch
>
>
> Catch-all performance improvement issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1911) File descriptor leak while indexing, may cause index corruption

2010-06-19 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880490#action_12880490
 ] 

Simon Rosenthal commented on SOLR-1911:
---

No - seems to have cleared up with trunk also,.

I'm OK with closing it but am really curious to know what changed between mid 
May and today to clear up the problem.

> File descriptor leak while indexing, may cause index corruption
> ---
>
> Key: SOLR-1911
> URL: https://issues.apache.org/jira/browse/SOLR-1911
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.5
> Environment: Ubuntu Linux, Java build 1.6.0_16-b01
> Solr Specification Version: 3.0.0.2010.05.12.16.17.46
>   Solr Implementation Version: 4.0-dev exported - simon - 2010-05-12 
> 16:17:46  -- bult from updated trunk
>   Lucene Specification Version: 4.0-dev
>   Lucene Implementation Version: 4.0-dev exported - 2010-05-12 16:18:26
>   Current Time: Thu May 13 12:21:12 EDT 2010
>   Server Start Time:Thu May 13 11:45:41 EDT 2010
>Reporter: Simon Rosenthal
>Priority: Critical
> Attachments: indexlsof.tar.gz, openafteropt.txt
>
>
> While adding documents to an already existing index using this build, the 
> number of open file descriptors increases dramatically until the open file 
> per-process limit is reached (1024) , at which point there are error messages 
> in the log to that effect. If the server is restarted the index may be corrupt
> commits are handled by autocommit every 60 seconds or 500 documents (usually 
> the time limit is reached first). 
> mergeFactor is 10.
> It looks as though each time a commit takes place, the number of open files  
> (obtained from " lsof -p `cat solr.pid` | egrep ' [0-9]+r ' ") increases by 
> 40, There are several open file descriptors associated with each file in the 
> index.
> Rerunning the same index updates with an older Solr (built from trunk in Feb 
> 2010) doesn't show this problem - the number of open files fluctuates up and 
> down as segments are created and merged, but stays basically constant.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

2010-06-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880483#action_12880483
 ] 

Yonik Seeley edited comment on LUCENE-2380 at 6/19/10 9:55 AM:
---

It was really tricky performance testing this.

If I started solr and tested one type of faceting exclusively, the performance 
impact of going through the new FieldCache interfaces (PackedInts for ord 
lookup) was relatively minimal.

However, I had a simple script that tested the different variants (the 4 in the 
table above)... and using that resulted in the bigger slowdowns.

The script would do the following:
{code}
1) test 100 iterations of facet.method=fc on the 100,000 term field
2) test 10 iterations of facet.method=fcs on the 100,000 term field
3) test 100 iterations of facet.method=fc on the 100 term field
4) test 10 iterations of facet.method=fcs on the 100 term field
{code}

I would run the script a few times, making sure the numbers stabilized and were 
repeatable.

Testing #1 alone resulted in trunk slowing down ~ 4%
Testing #1 along with any single other test: same small slowdown of ~4%
Running the complete script: slowdown of 33-38% for #1 (as well as others)
When running the complete script, the first run of Test #1 was always the 
best... as if the JVM correctly specialized it, but then discarded it later, 
never to return.
I saw the same affect on both an AMD Phenom II w/ ubuntu, Java 1.6_14 and Win7 
with a Core2, Java 1.6_17, both 64 bit.  The drop on Win7 was only 20% though.

So: you can't always depend on the JVM being able to inline stuff for you, and 
it seems very hard to determine when it can.
This obviously has implications for the lucene benchmarker too.


  was (Author: ysee...@gmail.com):
It was really tricky performance testing this.

If I started solr and tested one type of faceting exclusively, the performance 
impact of going through the new FieldCache interfaces (PackedInts for ord 
lookup) was relatively minimal.

However, I had a simple script that tested the different variants (the 4 in the 
table above)... and using that resulted in the bigger slowdowns.

The script would do the following:
{code}
1) test 100 iterations of facet.method=fc on the 100,000 term field
2) test 10 iterations of facet.method=fcs on the 100,000 term field
3) test 100 iterations of facet.method=fc on the 100 term field
4) test 10 iterations of facet.method=fcs on the 100 term field
{code}

I would run the script a few times, making sure the numbers stabilized and were 
repeatable.

Testing #1 alone resulted in trunk slowing down ~ 4%
Testing #1 along with any single other test: same small slowdown of ~4%
Running the complete script: slowdown of 33-38% for #1 (as well as others)
When running the complete script, the first run of Test #1 was always the 
best... as if the JVM correctly specialized it, but then discarded it later, 
never to return.

So: you can't always depend on the JVM being able to inline stuff for you, and 
it seems very hard to determine when it can.
This obviously has implications for the lucene benchmarker too.

  
> Add FieldCache.getTermBytes, to load term data as byte[]
> 
>
> Key: LUCENE-2380
> URL: https://issues.apache.org/jira/browse/LUCENE-2380
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch, 
> LUCENE-2380.patch, LUCENE-2380_direct_arr_access.patch, 
> LUCENE-2380_enum.patch, LUCENE-2380_enum.patch
>
>
> With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode 
> string, but not necessarily), so we need to push this up the search stack.
> FieldCache now has getStrings and getStringIndex; we need corresponding 
> methods to load terms as native byte[], since in general they may not be 
> representable as String.  This should be quite a bit more RAM efficient too, 
> for US ascii content since each character would then use 1 byte not 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

2010-06-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880483#action_12880483
 ] 

Yonik Seeley commented on LUCENE-2380:
--

It was really tricky performance testing this.

If I started solr and tested one type of faceting exclusively, the performance 
impact of going through the new FieldCache interfaces (PackedInts for ord 
lookup) was relatively minimal.

However, I had a simple script that tested the different variants (the 4 in the 
table above)... and using that resulted in the bigger slowdowns.

The script would do the following:
{code}
1) test 100 iterations of facet.method=fc on the 100,000 term field
2) test 10 iterations of facet.method=fcs on the 100,000 term field
3) test 100 iterations of facet.method=fc on the 100 term field
4) test 10 iterations of facet.method=fcs on the 100 term field
{code}

I would run the script a few times, making sure the numbers stabilized and were 
repeatable.

Testing #1 alone resulted in trunk slowing down ~ 4%
Testing #1 along with any single other test: same small slowdown of ~4%
Running the complete script: slowdown of 33-38% for #1 (as well as others)
When running the complete script, the first run of Test #1 was always the 
best... as if the JVM correctly specialized it, but then discarded it later, 
never to return.

So: you can't always depend on the JVM being able to inline stuff for you, and 
it seems very hard to determine when it can.
This obviously has implications for the lucene benchmarker too.


> Add FieldCache.getTermBytes, to load term data as byte[]
> 
>
> Key: LUCENE-2380
> URL: https://issues.apache.org/jira/browse/LUCENE-2380
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch, 
> LUCENE-2380.patch, LUCENE-2380_direct_arr_access.patch, 
> LUCENE-2380_enum.patch, LUCENE-2380_enum.patch
>
>
> With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode 
> string, but not necessarily), so we need to push this up the search stack.
> FieldCache now has getStrings and getStringIndex; we need corresponding 
> methods to load terms as native byte[], since in general they may not be 
> representable as String.  This should be quite a bit more RAM efficient too, 
> for US ascii content since each character would then use 1 byte not 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

2010-06-19 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-2380:
-

Attachment: LUCENE-2380_direct_arr_access.patch

This patch adds the ability to get at the raw arrays from the Direct* classes, 
and using those fixes the performance regressions in the "fc" faceting I was 
seeing.

To do this, it adds this to DocTermsIndex.  Anyone have a better solution?
{code}
/** @lucene.internal */
public abstract PackedInts.Reader getDocToOrd();
{code}


> Add FieldCache.getTermBytes, to load term data as byte[]
> 
>
> Key: LUCENE-2380
> URL: https://issues.apache.org/jira/browse/LUCENE-2380
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch, 
> LUCENE-2380.patch, LUCENE-2380_direct_arr_access.patch, 
> LUCENE-2380_enum.patch, LUCENE-2380_enum.patch
>
>
> With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode 
> string, but not necessarily), so we need to push this up the search stack.
> FieldCache now has getStrings and getStringIndex; we need corresponding 
> methods to load terms as native byte[], since in general they may not be 
> representable as String.  This should be quite a bit more RAM efficient too, 
> for US ascii content since each character would then use 1 byte not 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3

2010-06-19 Thread Uwe Schindler
It will not disappear in changes.txt, but at least it should not be so
prominent.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Saturday, June 19, 2010 11:38 AM
> To: dev@lucene.apache.org
> Subject: Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
> 
> OK, I think just removing the text claiming this is fixed, is good?
> 
> Mike
> 
> On Sat, Jun 19, 2010 at 5:29 AM, Uwe Schindler  wrote:
> > Mike, Koji: The release is out, but should I maybe simple remove the
> > announcement line (simply strike it out) on the lucene.apache.org
> > pages, so nobody expects this to be fixed really?
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> >> -Original Message-
> >> From: Uwe Schindler [mailto:u...@thetaphi.de]
> >> Sent: Saturday, June 19, 2010 10:19 AM
> >> To: dev@lucene.apache.org
> >> Subject: RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
> >>
> >> No problem, I fixed it now, see patches. For trunk, this was not an
> >> issue,
> > but
> >> for 3x, 3.0 and 2.9.
> >>
> >> -
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >>
> >> > -Original Message-
> >> > From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
> >> > Sent: Saturday, June 19, 2010 10:11 AM
> >> > To: dev@lucene.apache.org
> >> > Subject: Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
> >> >
> >> > (10/06/19 15:36), Uwe Schindler wrote:
> >> > > Hi Koji,
> >> > >
> >> > >
> >> > >>>    - FieldCacheImpl.getStringIndex() no longer throws an
> >> > >>> exception when term count exceeds doc count.
> >> > >>>
> >> > >>>
> >> > >> I think it is LUCENE-2142, but after it was fixed,
> >> > >> getStringIndex() still
> >> > >>
> >> > > throws
> >> > >
> >> > >> AIOOBE? Am I missing something?
> >> > >>
> >> > > I have seen you wrote a comment to 2142 on June 7, we have
> >> > > overseen
> >> > this.
> >> > > You should have reopened it and stop the release vote :(
> >> > >
> >> > > Uwe
> >> > >
> >> > >
> >> > Yeah. I should do that, but when vote going, I simply forgot the
issue.
> >> > Then I read your release announce, it reminded me of the issue.
> >> > I'm sorry about that...
> >> >
> >> > Koji
> >> >
> >> > --
> >> > http://www.rondhuit.com/en/
> >> >
> >> >
> >> > ---
> >> > -- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
> >> > additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
> >> additional commands, e-mail: dev-h...@lucene.apache.org
> >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
> > additional commands, e-mail: dev-h...@lucene.apache.org
> >
> >
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3

2010-06-19 Thread Michael McCandless
OK, I think just removing the text claiming this is fixed, is good?

Mike

On Sat, Jun 19, 2010 at 5:29 AM, Uwe Schindler  wrote:
> Mike, Koji: The release is out, but should I maybe simple remove the
> announcement line (simply strike it out) on the lucene.apache.org pages, so
> nobody expects this to be fixed really?
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Uwe Schindler [mailto:u...@thetaphi.de]
>> Sent: Saturday, June 19, 2010 10:19 AM
>> To: dev@lucene.apache.org
>> Subject: RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
>>
>> No problem, I fixed it now, see patches. For trunk, this was not an issue,
> but
>> for 3x, 3.0 and 2.9.
>>
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>
>> > -Original Message-
>> > From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
>> > Sent: Saturday, June 19, 2010 10:11 AM
>> > To: dev@lucene.apache.org
>> > Subject: Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
>> >
>> > (10/06/19 15:36), Uwe Schindler wrote:
>> > > Hi Koji,
>> > >
>> > >
>> > >>>    - FieldCacheImpl.getStringIndex() no longer throws an exception
>> > >>> when term count exceeds doc count.
>> > >>>
>> > >>>
>> > >> I think it is LUCENE-2142, but after it was fixed, getStringIndex()
>> > >> still
>> > >>
>> > > throws
>> > >
>> > >> AIOOBE? Am I missing something?
>> > >>
>> > > I have seen you wrote a comment to 2142 on June 7, we have overseen
>> > this.
>> > > You should have reopened it and stop the release vote :(
>> > >
>> > > Uwe
>> > >
>> > >
>> > Yeah. I should do that, but when vote going, I simply forgot the issue.
>> > Then I read your release announce, it reminded me of the issue.
>> > I'm sorry about that...
>> >
>> > Koji
>> >
>> > --
>> > http://www.rondhuit.com/en/
>> >
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>> > additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>> commands, e-mail: dev-h...@lucene.apache.org
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3

2010-06-19 Thread Uwe Schindler
Mike, Koji: The release is out, but should I maybe simple remove the
announcement line (simply strike it out) on the lucene.apache.org pages, so
nobody expects this to be fixed really?

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Saturday, June 19, 2010 10:19 AM
> To: dev@lucene.apache.org
> Subject: RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
> 
> No problem, I fixed it now, see patches. For trunk, this was not an issue,
but
> for 3x, 3.0 and 2.9.
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> 
> > -Original Message-
> > From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
> > Sent: Saturday, June 19, 2010 10:11 AM
> > To: dev@lucene.apache.org
> > Subject: Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
> >
> > (10/06/19 15:36), Uwe Schindler wrote:
> > > Hi Koji,
> > >
> > >
> > >>>- FieldCacheImpl.getStringIndex() no longer throws an exception
> > >>> when term count exceeds doc count.
> > >>>
> > >>>
> > >> I think it is LUCENE-2142, but after it was fixed, getStringIndex()
> > >> still
> > >>
> > > throws
> > >
> > >> AIOOBE? Am I missing something?
> > >>
> > > I have seen you wrote a comment to 2142 on June 7, we have overseen
> > this.
> > > You should have reopened it and stop the release vote :(
> > >
> > > Uwe
> > >
> > >
> > Yeah. I should do that, but when vote going, I simply forgot the issue.
> > Then I read your release announce, it reminded me of the issue.
> > I'm sorry about that...
> >
> > Koji
> >
> > --
> > http://www.rondhuit.com/en/
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
> > additional commands, e-mail: dev-h...@lucene.apache.org
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count

2010-06-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880463#action_12880463
 ] 

Michael McCandless commented on LUCENE-2142:


Thanks Uwe!

So your fix avoids any exception altogether.  On 3x, you just stop
loading when we hit a termOrd > number of docs.  On trunk, we keep
loading, simply growing the array as needed.

I'm torn on what the best disposition here is.  This API should only
be used on single-token (per doc) fields, so this handling we're
adding/fixing is about how to handle the misuse of the API.

Neither solution is great -- throwing an exception is nasty since you
could be fine for some time and then only on indexing enough docs,
perhaps well into production, trip the exception.  But then silently
pretending nothing is wrong is also not great because the app then has
no clue.

Really this'd be a great time to use a logging framework -- we'd drop
a error, and then not throw an exception.

Net/net I think your solution (don't throw an exception) is the lesser
evil at this time, so I think we should go with that.

But: I think we should also fix trunk?  Ie, if hit termOrd > numDocs,
silently break, instead of trying to grow the array.  Because now (on
trunk) if you try to load a DocTermsIndex on a large tokenized text
field in a large index you'll (try to) use insane amounts of memory...



> FieldCache.getStringIndex should not throw exception if term count exceeds 
> doc count
> 
>
> Key: LUCENE-2142
> URL: https://issues.apache.org/jira/browse/LUCENE-2142
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
> Attachments: LUCENE-2142-fix-3x.patch, LUCENE-2142-fix-trunk.patch
>
>
> Spinoff of LUCENE-2133/LUCENE-831.
> Currently FieldCache cannot handle more than one value per field.
> We may someday want to fix that... but until that day:
> FieldCache.getStringIndex currently does a simplistic check to try to
> catch when you've accidentally allowed more than one term per field,
> by testing if the number of unique terms exceeds the number of
> documents.
> The problem is, this is not a perfect check, in that it allows false
> negatives (you could have more than one term per field for some docs
> and the check won't catch you).
> Further, the exception thrown is the unchecked RuntimeException.
> So this means... you could happily think all is good, until some day,
> well into production, once you've updated enough docs, suddenly the
> check will catch you and throw an unhandled exception, stopping all
> searches [that need to sort by this string field] in their tracks.
> It's not gracefully degrading.
> I think we should simply remove the test, ie, if you have more terms
> than docs then the terms simply overwrite one another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3

2010-06-19 Thread Uwe Schindler
No problem, I fixed it now, see patches. For trunk, this was not an issue,
but for 3x, 3.0 and 2.9.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
> Sent: Saturday, June 19, 2010 10:11 AM
> To: dev@lucene.apache.org
> Subject: Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
> 
> (10/06/19 15:36), Uwe Schindler wrote:
> > Hi Koji,
> >
> >
> >>>- FieldCacheImpl.getStringIndex() no longer throws an exception
> >>> when term count exceeds doc count.
> >>>
> >>>
> >> I think it is LUCENE-2142, but after it was fixed, getStringIndex()
> >> still
> >>
> > throws
> >
> >> AIOOBE? Am I missing something?
> >>
> > I have seen you wrote a comment to 2142 on June 7, we have overseen
> this.
> > You should have reopened it and stop the release vote :(
> >
> > Uwe
> >
> >
> Yeah. I should do that, but when vote going, I simply forgot the issue.
> Then I read your release announce, it reminded me of the issue.
> I'm sorry about that...
> 
> Koji
> 
> --
> http://www.rondhuit.com/en/
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count

2010-06-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2142:
--

Attachment: LUCENE-2142-fix-3x.patch
LUCENE-2142-fix-trunk.patch

Here patch with test for 3.x and before. Trunk patch only contains test, which 
passes.

> FieldCache.getStringIndex should not throw exception if term count exceeds 
> doc count
> 
>
> Key: LUCENE-2142
> URL: https://issues.apache.org/jira/browse/LUCENE-2142
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
> Attachments: LUCENE-2142-fix-3x.patch, LUCENE-2142-fix-trunk.patch
>
>
> Spinoff of LUCENE-2133/LUCENE-831.
> Currently FieldCache cannot handle more than one value per field.
> We may someday want to fix that... but until that day:
> FieldCache.getStringIndex currently does a simplistic check to try to
> catch when you've accidentally allowed more than one term per field,
> by testing if the number of unique terms exceeds the number of
> documents.
> The problem is, this is not a perfect check, in that it allows false
> negatives (you could have more than one term per field for some docs
> and the check won't catch you).
> Further, the exception thrown is the unchecked RuntimeException.
> So this means... you could happily think all is good, until some day,
> well into production, once you've updated enough docs, suddenly the
> check will catch you and throw an unhandled exception, stopping all
> searches [that need to sort by this string field] in their tracks.
> It's not gracefully degrading.
> I think we should simply remove the test, ie, if you have more terms
> than docs then the terms simply overwrite one another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count

2010-06-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2142:
--

Attachment: (was: LUCENE-2142-fix.patch)

> FieldCache.getStringIndex should not throw exception if term count exceeds 
> doc count
> 
>
> Key: LUCENE-2142
> URL: https://issues.apache.org/jira/browse/LUCENE-2142
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
> Attachments: LUCENE-2142-fix-3x.patch, LUCENE-2142-fix-trunk.patch
>
>
> Spinoff of LUCENE-2133/LUCENE-831.
> Currently FieldCache cannot handle more than one value per field.
> We may someday want to fix that... but until that day:
> FieldCache.getStringIndex currently does a simplistic check to try to
> catch when you've accidentally allowed more than one term per field,
> by testing if the number of unique terms exceeds the number of
> documents.
> The problem is, this is not a perfect check, in that it allows false
> negatives (you could have more than one term per field for some docs
> and the check won't catch you).
> Further, the exception thrown is the unchecked RuntimeException.
> So this means... you could happily think all is good, until some day,
> well into production, once you've updated enough docs, suddenly the
> check will catch you and throw an unhandled exception, stopping all
> searches [that need to sort by this string field] in their tracks.
> It's not gracefully degrading.
> I think we should simply remove the test, ie, if you have more terms
> than docs then the terms simply overwrite one another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3

2010-06-19 Thread Koji Sekiguchi

(10/06/19 15:36), Uwe Schindler wrote:

Hi Koji,

   

   - FieldCacheImpl.getStringIndex() no longer throws an exception when
term count exceeds doc count.

   

I think it is LUCENE-2142, but after it was fixed, getStringIndex() still
 

throws
   

AIOOBE? Am I missing something?
 

I have seen you wrote a comment to 2142 on June 7, we have overseen this.
You should have reopened it and stop the release vote :(

Uwe

   

Yeah. I should do that, but when vote going, I simply forgot the issue.
Then I read your release announce, it reminded me of the issue.
I'm sorry about that...

Koji

--
http://www.rondhuit.com/en/


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count

2010-06-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2142:
--

Attachment: LUCENE-2142-fix.patch

After a coffee i have seen the problem, too - stupoid :(

Here is the fix for 3.x (also 3.0 and 2.9) - in trunk the fix is not needed, as 
there are growable arrays. Maybe we should add a simple test to all branches!



> FieldCache.getStringIndex should not throw exception if term count exceeds 
> doc count
> 
>
> Key: LUCENE-2142
> URL: https://issues.apache.org/jira/browse/LUCENE-2142
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
> Attachments: LUCENE-2142-fix.patch
>
>
> Spinoff of LUCENE-2133/LUCENE-831.
> Currently FieldCache cannot handle more than one value per field.
> We may someday want to fix that... but until that day:
> FieldCache.getStringIndex currently does a simplistic check to try to
> catch when you've accidentally allowed more than one term per field,
> by testing if the number of unique terms exceeds the number of
> documents.
> The problem is, this is not a perfect check, in that it allows false
> negatives (you could have more than one term per field for some docs
> and the check won't catch you).
> Further, the exception thrown is the unchecked RuntimeException.
> So this means... you could happily think all is good, until some day,
> well into production, once you've updated enough docs, suddenly the
> check will catch you and throw an unhandled exception, stopping all
> searches [that need to sort by this string field] in their tracks.
> It's not gracefully degrading.
> I think we should simply remove the test, ie, if you have more terms
> than docs then the terms simply overwrite one another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org