date:20100616

[jira] Created: (SOLR-1956) luke cannot be launched by ant luke

2010-06-16 Thread Koji Sekiguchi (JIRA)

luke cannot be launched by ant luke
---

 Key: SOLR-1956
 URL: https://issues.apache.org/jira/browse/SOLR-1956
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1, 4.0
Reporter: Koji Sekiguchi
Priority: Trivial


After merging Lucene/Solr, we need to compile lucene/solr manually to launch 
luke.

For branch_3x:

{code}
$ cd solr
$ ant luke
=> Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/lucene/search/Collector

$ ant compile
$ ant luke
=> Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/lucene/xmlparser/CoreParser

$ cd ../lucene/contrib/xml-query-parser
$ ant
$ cd ../../../solr
$ ant luke
=> luke can be launched
{code}

For trunk, it seems that luke-1.0.1 uses o.a.l.a.SimpleAnalyzer, but the class 
has been changed package name to o.a.l.a.core (and luke-1.0.1 doesn't support 
flex in the first place?):
{code}
$ ant luke
=> Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/lucene/analysis/SimpleAnalyzer
{code}

So I'd like to fix it for branch_3x for luke-1.0.1 at the beginning.
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-1885) StreamingUpdateSolrServer hangs

2010-06-16 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1885.


Fix Version/s: 1.4.1
   (was: Next)
   Resolution: Fixed

Looks good, and just in time for 1.4.1.  Thanks Erik!

> StreamingUpdateSolrServer hangs
> ---
>
> Key: SOLR-1885
> URL: https://issues.apache.org/jira/browse/SOLR-1885
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
> Fix For: 1.4.1
>
> Attachments: stream_release_connection_fix.diff, TestSolrJPerf.java
>
>
> Looks like we may still have a hanging issue:
> http://search.lucidimagination.com/search/document/90c4a942e18ad572/streamingupdatesolrserver_hangs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2490) 'ant generate-maven-artifacts' should work for lucene+solr 3.x+

2010-06-16 Thread Ryan McKinley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley resolved LUCENE-2490.
---

Fix Version/s: 4.0
   Resolution: Fixed

applied to 3.x and trunk

This does not change to -SNAPSHOT that could (perhaps) happen in a 
different issue

> 'ant generate-maven-artifacts' should work for lucene+solr 3.x+
> ---
>
> Key: LUCENE-2490
> URL: https://issues.apache.org/jira/browse/LUCENE-2490
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: 3.0
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2490-maven.patch, LUCENE-2490-maven.patch
>
>
> The maven build scripts need to be updated so that solr uses the artifacts 
> from lucene.
> For consistency, we should be able to have a different 'maven_version' then 
> the 'version'  That is, we want to build: 3.1-SNAPSHOT with a jar file: 
> 3.1-dev

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Build failed in Hudson: Lucene-trunk #1218

2010-06-16 Thread Apache Hudson Server

See 

Changes:

[mikemccand] LUCENE-2380: hard cutover of all preflex APIs to flex

[mikemccand] LUCENE-2380: hard cutover of all preflex APIs to flex

[rmuir] LUCENE-2413: directory and package fixes

[rmuir] LUCENE-2413: directory and package fixes

[rmuir] correct eol-style

--
[...truncated 3346 lines...]
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.02 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestPositiveScoresOnlyCollector
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.004 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestPrefixFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.006 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestPrefixInBooleanQuery
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.187 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestPrefixQuery
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.005 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestQueryTermVector
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.006 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestQueryWrapperFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.008 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestRegexpQuery
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.028 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestRegexpRandom
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 9.067 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestRegexpRandom2
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 10.734 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestScoreCachingWrappingScorer
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.004 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestScorerPerf
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.619 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSetNorm
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.005 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSimilarity
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.006 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSimpleExplanations
[junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 2.156 sec
[junit] 
[junit] Testsuite: 
org.apache.lucene.search.TestSimpleExplanationsOfNonMatches
[junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 0.12 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSloppyPhraseQuery
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.212 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSort
[junit] Tests run: 23, Failures: 0, Errors: 0, Time elapsed: 4.351 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSpanQueryFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.01 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermRangeFilter
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 4.126 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermRangeQuery
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.035 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermScorer
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.009 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermVectors
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.173 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestThreadSafe
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.533 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 1.046 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.013 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.004 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestWildcard
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.037 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestWildcardRandom
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.7 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestCustomScoreQuery
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 7.116 sec

[jira] Commented: (LUCENE-2490) 'ant generate-maven-artifacts' should work for lucene+solr 3.x+

2010-06-16 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879608#action_12879608
 ] 

Ryan McKinley commented on LUCENE-2490:
---

Yes, I guess there are two issues going on...
1. have 'ant generate-maven-artifacts' build a lucene+solr combo that works
2. get hudson to deploy SNAPSHOT artifacts to: 
https://repository.apache.org/content/groups/snapshots-group/

with snapshot artifacts, people can knowing develop against the latest from 
/trunk (or branch-3x)


> 'ant generate-maven-artifacts' should work for lucene+solr 3.x+
> ---
>
> Key: LUCENE-2490
> URL: https://issues.apache.org/jira/browse/LUCENE-2490
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: 3.0
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2490-maven.patch, LUCENE-2490-maven.patch
>
>
> The maven build scripts need to be updated so that solr uses the artifacts 
> from lucene.
> For consistency, we should be able to have a different 'maven_version' then 
> the 'version'  That is, we want to build: 3.1-SNAPSHOT with a jar file: 
> 3.1-dev

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Solr 1.4.1

2010-06-16 Thread Chris Hostetter

: 
: Hmmm...it actually looks like the reverse happend?
: 
: 1.4 was built with javadoc 1.6:

whoops! ... you're correct, i missread the diff.

: I'll try doing the next RC with 1.6 instead?

for hte javadoc's you mean? .. yeah that's probably wise -- just to 
minimize the total changes (it's a minor thing, but if it's easy to do 
we might as well)



-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Solr 1.4.1

2010-06-16 Thread Mark Miller


On 6/16/10 8:31 PM, Yonik Seeley wrote:

The newly provided patch at
https://issues.apache.org/jira/browse/SOLR-1885 looks correct.
Should we include it?

-Yonik
http://www.lucidimagination.com



+1

--
- Mark

http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1885) StreamingUpdateSolrServer hangs

2010-06-16 Thread Erik Hetzner (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879595#action_12879595
 ] 

Erik Hetzner commented on SOLR-1885:


FWIW if I am correct this bug was introduced by the use do...while loop added 
to fix SOLR-1711.

> StreamingUpdateSolrServer hangs
> ---
>
> Key: SOLR-1885
> URL: https://issues.apache.org/jira/browse/SOLR-1885
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
> Fix For: Next
>
> Attachments: stream_release_connection_fix.diff, TestSolrJPerf.java
>
>
> Looks like we may still have a hanging issue:
> http://search.lucidimagination.com/search/document/90c4a942e18ad572/streamingupdatesolrserver_hangs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-1900) move Solr to flex APIs

2010-06-16 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1900.


Fix Version/s: 4.0
   (was: Next)
   Resolution: Fixed

closing.  LUCENE-2378 did the rest.

> move Solr to flex APIs
> --
>
> Key: SOLR-1900
> URL: https://issues.apache.org/jira/browse/SOLR-1900
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.0
>Reporter: Yonik Seeley
> Fix For: 4.0
>
> Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch, 
> SOLR-1900_termsComponent.txt
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Solr 1.4.1

2010-06-16 Thread Yonik Seeley

The newly provided patch at
https://issues.apache.org/jira/browse/SOLR-1885 looks correct.
Should we include it?

-Yonik
http://www.lucidimagination.com


On Wed, Jun 16, 2010 at 8:21 PM, Mark Miller  wrote:
> On 6/16/10 7:00 PM, Chris Hostetter wrote:
>
>> the javadocs seem to have been built with javadoc 1.6 (not 1.5)
>> ... this seems to have changed the link style somewhat significantly.
>> doesn't seem like it's a problem, but it was a little hard to review
>> for changes.  If we do another RC it may be worthwhile to ensure
>> Javadoc 1.5 is used for consistency with Solr 1.4.0
>>
>> (if JAVA_HOME is set to a 1.5 java install during build this shouldn't
>> be a problem)
>>
>
> Hmmm...it actually looks like the reverse happend?
>
> 1.4 was built with javadoc 1.6:
>
> 
>
> and my 1.4.1 rc was built with 1.5:
>
> 
>
> I'll try doing the next RC with 1.6 instead?
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Solr 1.4.1

2010-06-16 Thread Mark Miller


On 6/16/10 7:00 PM, Chris Hostetter wrote:


the javadocs seem to have been built with javadoc 1.6 (not 1.5)
... this seems to have changed the link style somewhat significantly.
doesn't seem like it's a problem, but it was a little hard to review
for changes.  If we do another RC it may be worthwhile to ensure
Javadoc 1.5 is used for consistency with Solr 1.4.0

(if JAVA_HOME is set to a 1.5 java install during build this shouldn't
be a problem)



Hmmm...it actually looks like the reverse happend?

1.4 was built with javadoc 1.6:




and my 1.4.1 rc was built with 1.5:




I'll try doing the next RC with 1.6 instead?

--
- Mark

http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Solr 1.4.1

2010-06-16 Thread Mark Miller


On 6/16/10 7:00 PM, Chris Hostetter wrote:


I'm seeing multiple velocity contrib jars w/odd file names...

$ ls contrib/velocity/src/main/solr/lib/*solr*
contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4.1-dev.jar
contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4.1.jar
contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4.2-dev.jar
contrib/velocity/src/main/solr/lib/apache-solr-velocity-X.Y.M.jar

...this is probably a show stopper that will require a new RC.

having multiple versions of the jar could easily break things with
class incompatibilities depending on what version they get loaded in
by the classloader.

(FWIW: all of the "solr" jars are all suppose to be in "dist" ... but
it looks like that dir is where the apache-solr-velocity jar was in
1.4.0 as well so probably not something we should change in this
release)



I'm actually seeing both
contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4.0.jar
contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4.1-dev.jar

In the Solr 1.4 release...

Won't happen this time though.



--
- Mark

http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard

2010-06-16 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-2167:
---

Assignee: Robert Muir

> Implement StandardTokenizer with the UAX#29 Standard
> 
>
> Key: LUCENE-2167
> URL: https://issues.apache.org/jira/browse/LUCENE-2167
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/analyzers
>Affects Versions: 3.1
>Reporter: Shyamal Prasad
>Assignee: Robert Muir
>Priority: Minor
> Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, 
> LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, 
> LUCENE-2167-lucene-buildhelper-maven-plugin.patch, 
> LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.patch, 
> LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
> LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
> LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> It would be really nice for StandardTokenizer to adhere straight to the 
> standard as much as we can with jflex. Then its name would actually make 
> sense.
> Such a transition would involve renaming the old StandardTokenizer to 
> EuropeanTokenizer, as its javadoc claims:
> bq. This should be a good tokenizer for most European-language documents
> The new StandardTokenizer could then say
> bq. This should be a good tokenizer for most languages.
> All the english/euro-centric stuff like the acronym/company/apostrophe stuff 
> can stay with that EuropeanTokenizer, and it could be used by the european 
> analyzers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Reopened: (SOLR-1951) extractingUpdateHandler doesn't close socket handles promptly, and indexing load tests eventually run out of resources

2010-06-16 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reopened SOLR-1951:



fixing status so it's clear no changes were made

> extractingUpdateHandler doesn't close socket handles promptly, and indexing 
> load tests eventually run out of resources
> --
>
> Key: SOLR-1951
> URL: https://issues.apache.org/jira/browse/SOLR-1951
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.4.1, 1.5
> Environment: sun java
> solr 1.5 build based on trunk
> debian linux "lenny"
>Reporter: Karl Wright
> Attachments: solr-1951.zip
>
>
> When multiple threads pound on extractingUpdateRequestHandler using multipart 
> form posting over an extended period of time, I'm seeing a huge number of 
> sockets piling up in the following state:
> tcp6   0  0 127.0.0.1:8983  127.0.0.1:44058 TIME_WAIT
> Despite the fact that the client can only have 10 sockets open at a time, 
> huge numbers of sockets accumulate that are in this state:
> r...@duck6:~# netstat -an | fgrep :8983 | wc
>   28223  169338 2257840
> r...@duck6:~#
> The sheer number of sockets lying around seems to eventually cause 
> commons-fileupload to fail (silently - another bug) in creating a temporary 
> file to contain the content data.  This causes Solr to erroneously return a 
> 400 code with "missing_content_data" or some such to the indexing poster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-1951) extractingUpdateHandler doesn't close socket handles promptly, and indexing load tests eventually run out of resources

2010-06-16 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-1951.


Fix Version/s: (was: 1.5)
   (was: 1.4.1)
   Resolution: Invalid

> extractingUpdateHandler doesn't close socket handles promptly, and indexing 
> load tests eventually run out of resources
> --
>
> Key: SOLR-1951
> URL: https://issues.apache.org/jira/browse/SOLR-1951
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.4.1, 1.5
> Environment: sun java
> solr 1.5 build based on trunk
> debian linux "lenny"
>Reporter: Karl Wright
> Attachments: solr-1951.zip
>
>
> When multiple threads pound on extractingUpdateRequestHandler using multipart 
> form posting over an extended period of time, I'm seeing a huge number of 
> sockets piling up in the following state:
> tcp6   0  0 127.0.0.1:8983  127.0.0.1:44058 TIME_WAIT
> Despite the fact that the client can only have 10 sockets open at a time, 
> huge numbers of sockets accumulate that are in this state:
> r...@duck6:~# netstat -an | fgrep :8983 | wc
>   28223  169338 2257840
> r...@duck6:~#
> The sheer number of sockets lying around seems to eventually cause 
> commons-fileupload to fail (silently - another bug) in creating a temporary 
> file to contain the content data.  This causes Solr to erroneously return a 
> 400 code with "missing_content_data" or some such to the indexing poster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard

2010-06-16 Thread Steven Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe reassigned LUCENE-2167:
---

Assignee: (was: Steven Rowe)

After a discussion with Robert on #lucene, I think this issue is complete - we 
can add more stuff later in a separate issue.

> Implement StandardTokenizer with the UAX#29 Standard
> 
>
> Key: LUCENE-2167
> URL: https://issues.apache.org/jira/browse/LUCENE-2167
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/analyzers
>Affects Versions: 3.1
>Reporter: Shyamal Prasad
>Priority: Minor
> Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, 
> LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, 
> LUCENE-2167-lucene-buildhelper-maven-plugin.patch, 
> LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.patch, 
> LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
> LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
> LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> It would be really nice for StandardTokenizer to adhere straight to the 
> standard as much as we can with jflex. Then its name would actually make 
> sense.
> Such a transition would involve renaming the old StandardTokenizer to 
> EuropeanTokenizer, as its javadoc claims:
> bq. This should be a good tokenizer for most European-language documents
> The new StandardTokenizer could then say
> bq. This should be a good tokenizer for most languages.
> All the english/euro-centric stuff like the acronym/company/apostrophe stuff 
> can stay with that EuropeanTokenizer, and it could be used by the european 
> analyzers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Solr 1.4.1

2010-06-16 Thread Chris Hostetter


: Interesting - I'll look into it - def using java 1.5 for the build (build 1.6
: and 1.7 are also on this machine though).

it may just be that the 1.6 version of javadoc is first in your PATH ... 
not sure how  calls javadoc.

: Yuck - this gets created by the build (and I had done the build a couple times
: with the wrong system params - version x.y.m and 1.4.2 and the default of
: 1.4.1-dev) - and clean doesn't remove them. Will take care of - this should be
: cleaned up - will likely be fixed by changing where this jar is created as you
: mention below - I'll handle will a manual clean for now.

yeah .. the bug is definitely in the velocity build process, but changing 
the location of the jar in a X.Y.1 release seems bad.


: Ugg - yeah, the local modifications are because I have to modify the
: prepare-release target to pass my username to svn - else it tries to use mark
: rather than markrmiller. That is a major pain (I don't like that it tries to
: commit for me anyway). It would be easy to get around by calling other targets
: (and skipping build-site) if it didn't have some maven cruft it did in it -
: I'll look into a workaround.

i'm not really familiar with that target -- but you should just commit 
that modification to all the branches (1.4, 3x, and trunk) .. that's a 
horribly assumption for our build file to make.



-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1954) Highlighter component should expose snippet character offsets and the score.

2010-06-16 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879580#action_12879580
 ] 

Hoss Man commented on SOLR-1954:


if the structure is poor and hard to add additional metadata to which would be 
beneficial to new users let's change it

As long as there is an option people can turn on to force the legacy behavior 
there's nothing wrong with that.

In it's simplest form we can just add a new Highlighting Component (with a 
different class name) that is registered by default as the component 
"highlight" and document in CHANGES.txt that if people need/want the old one 
they should modify their solrconfig.xml to register it explicitly .

alternately we can keep using hte existing class, and modify it so that it 
changes it's behavior based on some init param, ditto previous comments about 
default behavior and CHANGES.txt

(back compat should be *easy* on upgrade, but i'd rather tell existing users 
"add this one line to your config if you really need the exact same response 
structure instead of this new better structure" then tell new and existing 
users "this is the really klunky hoop you have to jump through to make sense of 
all this hot new data we are returning")

> Highlighter component should expose snippet character offsets and the score.
> 
>
> Key: SOLR-1954
> URL: https://issues.apache.org/jira/browse/SOLR-1954
> Project: Solr
>  Issue Type: New Feature
>  Components: highlighter
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-1954_start_and_end_offsets.patch
>
>
> The Highlighter Component does not currently expose the snippet character 
> offsets nor the score.  There is a TODO in DefaultSolrHighlighter indicating 
> the intention to add this eventually.  This information is needed when doing 
> highlighting on external content.  The data is there so its pretty easy to 
> output it in some way.  The challenge is deciding on the output and its 
> ramifications on backwards compatibility.  The current highlighter component 
> response structure doesn't lend itself to adding any new data, unfortunately. 
>  I wish the original implementer had some foresight.  Unfortunately all the 
> highlighting tests assume this structure.  Here is a snippet of the current 
> response structure in Solr's sample data searching for "sdram" for reference:
> {code:xml}
> 
>  
>   
>   CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM 
> Unbuffered DDR 400 (PC 3200) System Memory - Retail
>   
>  
> 
> {code}
> Perhaps as a little hack, we introduce a pseudo field called 
> text_startCharOffset which is the concatenation of the matching field and 
> "_startCharOffset".  This would be an array of ints.  Likewise, there would 
> be another array for endCharOffset and score.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Solr 1.4.1

2010-06-16 Thread Mark Miller


Inline responses:

On 6/16/10 7:00 PM, Chris Hostetter wrote:

On Tue, 15 Jun 2010, Mark Miller wrote:

: Date: Tue, 15 Jun 2010 11:54:46 -0400
: From: Mark Miller
: Reply-To: dev@lucene.apache.org
: To: dev@lucene.apache.org
: Subject: [VOTE] Release Solr 1.4.1
:
: Please vote on releasing the Solr 1.4.1 artifacts located at
: http://http://people.apache.org/~markrmiller/staging-area/

Source wise things look good -- but I think we're going to need another RC
because of some cruft i'm seeing in the release. full details below...

--

### Review of these artifacts...

md5sum apache-solr-1.4.1.tgz apache-solr-1.4.1.zip
915febc17bd40eb7db2f8be318fd439d  apache-solr-1.4.1.tgz
f672e3759e8f0d3d0eb738f2dbca5bf1  apache-solr-1.4.1.zip

### Things I looked at

  - recursive diff of tgz and zip artifacts to make sure they match
- ignoring line endings for obvious reasons
  - basic usage of example via the tutorial
  - recursive diff of 1.4.0 with 1.4.1
- verify that file lists are the same, except where expected
- detailed reviewed all src diffs

### Notes

all src diffs corrispond with my expecations based on CHANGES.txt

--

the javadocs seem to have been built with javadoc 1.6 (not 1.5)
... this seems to have changed the link style somewhat significantly.
doesn't seem like it's a problem, but it was a little hard to review
for changes.  If we do another RC it may be worthwhile to ensure
Javadoc 1.5 is used for consistency with Solr 1.4.0

(if JAVA_HOME is set to a 1.5 java install during build this shouldn't
be a problem)


Interesting - I'll look into it - def using java 1.5 for the build 
(build 1.6 and 1.7 are also on this machine though).




--

there is an odd looking "bin" directory at the top level of the
release containing all the test files and the *.class files.  not sure
where that came from (artifact of miller's IDE?) ... it's 7MB of cruft.


Very weird - Eclipse does work with a bin folder, but its at the top 
level of the project (eg next to build) - why the heck does the Solr 
dist process pull that in? I'll take care of it, but I don't see why 
that should happen - part of pulling in the src dirs or something? Easy 
enough to remove the bin folder first.




--

I'm seeing multiple velocity contrib jars w/odd file names...

$ ls contrib/velocity/src/main/solr/lib/*solr*
contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4.1-dev.jar
contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4.1.jar
contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4.2-dev.jar
contrib/velocity/src/main/solr/lib/apache-solr-velocity-X.Y.M.jar

...this is probably a show stopper that will require a new RC.


Yuck - this gets created by the build (and I had done the build a couple 
times with the wrong system params - version x.y.m and 1.4.2 and the 
default of 1.4.1-dev) - and clean doesn't remove them. Will take care of 
- this should be cleaned up - will likely be fixed by changing where 
this jar is created as you mention below - I'll handle will a manual 
clean for now.




having multiple versions of the jar could easily break things with
class incompatibilities depending on what version they get loaded in
by the classloader.

(FWIW: all of the "solr" jars are all suppose to be in "dist" ... but
it looks like that dir is where the apache-solr-velocity jar was in
1.4.0 as well so probably not something we should change in this
release)

--

the example has an uncompressed war, a request log file, and
an already constructed index in it...

Only in tgz/apache-solr-1.4.1/example/logs: 2010_06_16.request.log
Only in tgz/apache-solr-1.4.1/example/solr: data
Only in tgz/apache-solr-1.4.1/example/work:
Jetty_0_0_0_0_8983_solr.war__solr__k1kf17

...these aren't suppose to be there (but probably don't hurt much)


Will kills these too - more stuff that it would be ideal if a master 
clean removed. Would be ideal if a clean prepare-release or clean-all 
prepare-release or something put you in the right shape with regards to 
all this cruft.




--

version info says...

Solr Implementation Version: 1.4.1 954930M - mark - 2010-06-15 11:23:44

...that "M" indicates local modifications.  it's not really a blocker,
but ideally the build should be from a known reproducable state of
SVN (ie: since it looks like we may do another release candidate,
let's make sure it's an unmodified checkout, but it's not a huge deal)


Ugg - yeah, the local modifications are because I have to modify the 
prepare-release target to pass my username to svn - else it tries to use 
mark rather than markrmiller. That is a major pain (I don't like that it 
tries to commit for me anyway). It would be easy to get around by 
calling other targets (and skipping build-site) if it didn't have some 
maven cruft it did in it - I'll look into a workaround.


I don't mind the auto stuff for building the site, but I'd rather the 
commit part was either a separate target or something to be done manually.




---


-Hoss


--

[jira] Commented: (LUCENE-2490) 'ant generate-maven-artifacts' should work for lucene+solr 3.x+

2010-06-16 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879579#action_12879579
 ] 

Steven Rowe commented on LUCENE-2490:
-

I was focusing on the local repo install use case, where each install 
overwrites the previous same-named install.  What use case are you thinking of? 
 What deploy target is intended?

> 'ant generate-maven-artifacts' should work for lucene+solr 3.x+
> ---
>
> Key: LUCENE-2490
> URL: https://issues.apache.org/jira/browse/LUCENE-2490
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: 3.0
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2490-maven.patch, LUCENE-2490-maven.patch
>
>
> The maven build scripts need to be updated so that solr uses the artifacts 
> from lucene.
> For consistency, we should be able to have a different 'maven_version' then 
> the 'version'  That is, we want to build: 3.1-SNAPSHOT with a jar file: 
> 3.1-dev

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2490) 'ant generate-maven-artifacts' should work for lucene+solr 3.x+

2010-06-16 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879577#action_12879577
 ] 

Ryan McKinley commented on LUCENE-2490:
---

in maven, SNAPSHOT builds get checked and replaced often.   If its not a 
SNAPSHOT, it gets downloaded once and that is that.  

We needs some way to reference the /trunk builds from maven

> 'ant generate-maven-artifacts' should work for lucene+solr 3.x+
> ---
>
> Key: LUCENE-2490
> URL: https://issues.apache.org/jira/browse/LUCENE-2490
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: 3.0
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2490-maven.patch, LUCENE-2490-maven.patch
>
>
> The maven build scripts need to be updated so that solr uses the artifacts 
> from lucene.
> For consistency, we should be able to have a different 'maven_version' then 
> the 'version'  That is, we want to build: 3.1-SNAPSHOT with a jar file: 
> 3.1-dev

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2490) 'ant generate-maven-artifacts' should work for lucene+solr 3.x+

2010-06-16 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879571#action_12879571
 ] 

Steven Rowe commented on LUCENE-2490:
-

bq. If you can get things to work keeping the -dev (and still marked as a 
SNAPSHOT build) that would be great!

Hmm, why is it a requirement that it's marked as a SNAPSHOT build?


> 'ant generate-maven-artifacts' should work for lucene+solr 3.x+
> ---
>
> Key: LUCENE-2490
> URL: https://issues.apache.org/jira/browse/LUCENE-2490
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: 3.0
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2490-maven.patch, LUCENE-2490-maven.patch
>
>
> The maven build scripts need to be updated so that solr uses the artifacts 
> from lucene.
> For consistency, we should be able to have a different 'maven_version' then 
> the 'version'  That is, we want to build: 3.1-SNAPSHOT with a jar file: 
> 3.1-dev

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Solr 1.4.1

2010-06-16 Thread Chris Hostetter

On Tue, 15 Jun 2010, Mark Miller wrote:

: Date: Tue, 15 Jun 2010 11:54:46 -0400
: From: Mark Miller 
: Reply-To: dev@lucene.apache.org
: To: dev@lucene.apache.org
: Subject: [VOTE] Release Solr 1.4.1
: 
: Please vote on releasing the Solr 1.4.1 artifacts located at
: http://http://people.apache.org/~markrmiller/staging-area/

Source wise things look good -- but I think we're going to need another RC
because of some cruft i'm seeing in the release. full details below...

--

### Review of these artifacts...

md5sum apache-solr-1.4.1.tgz apache-solr-1.4.1.zip
915febc17bd40eb7db2f8be318fd439d  apache-solr-1.4.1.tgz
f672e3759e8f0d3d0eb738f2dbca5bf1  apache-solr-1.4.1.zip

### Things I looked at

 - recursive diff of tgz and zip artifacts to make sure they match
   - ignoring line endings for obvious reasons
 - basic usage of example via the tutorial
 - recursive diff of 1.4.0 with 1.4.1 
   - verify that file lists are the same, except where expected
   - detailed reviewed all src diffs 

### Notes

all src diffs corrispond with my expecations based on CHANGES.txt

--

the javadocs seem to have been built with javadoc 1.6 (not 1.5)
... this seems to have changed the link style somewhat significantly.
doesn't seem like it's a problem, but it was a little hard to review
for changes.  If we do another RC it may be worthwhile to ensure
Javadoc 1.5 is used for consistency with Solr 1.4.0

(if JAVA_HOME is set to a 1.5 java install during build this shouldn't
be a problem) 

--

there is an odd looking "bin" directory at the top level of the
release containing all the test files and the *.class files.  not sure
where that came from (artifact of miller's IDE?) ... it's 7MB of cruft.

--

I'm seeing multiple velocity contrib jars w/odd file names...

$ ls contrib/velocity/src/main/solr/lib/*solr*
contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4.1-dev.jar
contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4.1.jar
contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4.2-dev.jar
contrib/velocity/src/main/solr/lib/apache-solr-velocity-X.Y.M.jar

...this is probably a show stopper that will require a new RC.  

having multiple versions of the jar could easily break things with
class incompatibilities depending on what version they get loaded in
by the classloader. 

(FWIW: all of the "solr" jars are all suppose to be in "dist" ... but
it looks like that dir is where the apache-solr-velocity jar was in
1.4.0 as well so probably not something we should change in this
release) 

--

the example has an uncompressed war, a request log file, and
an already constructed index in it...

Only in tgz/apache-solr-1.4.1/example/logs: 2010_06_16.request.log
Only in tgz/apache-solr-1.4.1/example/solr: data
Only in tgz/apache-solr-1.4.1/example/work: 
Jetty_0_0_0_0_8983_solr.war__solr__k1kf17

...these aren't suppose to be there (but probably don't hurt much)

--

version info says...

Solr Implementation Version: 1.4.1 954930M - mark - 2010-06-15 11:23:44

...that "M" indicates local modifications.  it's not really a blocker,
but ideally the build should be from a known reproducable state of
SVN (ie: since it looks like we may do another release candidate,
let's make sure it's an unmodified checkout, but it's not a huge deal)

---


-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2490) 'ant generate-maven-artifacts' should work for lucene+solr 3.x+

2010-06-16 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879569#action_12879569
 ] 

Ryan McKinley commented on LUCENE-2490:
---

yes, I think maven *can* deal with something other then -SNAPSHOT.jar files, 
but it would require getting the pom files all sorted -- a non-trivial task (at 
least something beyond my skills/attention).

Changing to -SNAPSHOT makes the maven stuff work easy peasy, BUT it breaks the 
back compat tests that assume "-dev" and I have not figured out how to fix it 
yet.

If you can get things to work keeping the -dev (and still marked as a SNAPSHOT 
build) that would be great!


> 'ant generate-maven-artifacts' should work for lucene+solr 3.x+
> ---
>
> Key: LUCENE-2490
> URL: https://issues.apache.org/jira/browse/LUCENE-2490
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: 3.0
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2490-maven.patch, LUCENE-2490-maven.patch
>
>
> The maven build scripts need to be updated so that solr uses the artifacts 
> from lucene.
> For consistency, we should be able to have a different 'maven_version' then 
> the 'version'  That is, we want to build: 3.1-SNAPSHOT with a jar file: 
> 3.1-dev

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2490) 'ant generate-maven-artifacts' should work for lucene+solr 3.x+

2010-06-16 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879566#action_12879566
 ] 

Steven Rowe commented on LUCENE-2490:
-

Ryan, why is it necessary to switch from -dev to -SNAPSHOT?  Maven can deal 
with -dev as a version suffix, can't it?

I ask because I'm looking at adding functionality in Lucene and Solr to install 
Maven jars in the user's local repository, and I was planning on going the 
route of making everything -dev.

I'll add a link here after I make an issue and put up a patch.



> 'ant generate-maven-artifacts' should work for lucene+solr 3.x+
> ---
>
> Key: LUCENE-2490
> URL: https://issues.apache.org/jira/browse/LUCENE-2490
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: 3.0
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2490-maven.patch, LUCENE-2490-maven.patch
>
>
> The maven build scripts need to be updated so that solr uses the artifacts 
> from lucene.
> For consistency, we should be able to have a different 'maven_version' then 
> the 'version'  That is, we want to build: 3.1-SNAPSHOT with a jar file: 
> 3.1-dev

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2378) Cutover remaining usage of pre-flex APIs

2010-06-16 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879546#action_12879546
 ] 

Yonik Seeley commented on LUCENE-2378:
--

OK, here's an updated TestEnumPerf.java that tests iteration over docs.  It's 
obviously the culprit.
Args: 100 999 10 10  (same as before, just fewer iterations).

trunk=11146ms, branch_3x=4271ms.  trunk is 160% slower!

> Cutover remaining usage of pre-flex APIs
> 
>
> Key: LUCENE-2378
> URL: https://issues.apache.org/jira/browse/LUCENE-2378
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2378.patch, LUCENE-2378.patch, 
> LUCENE-2378_FileFloatSource.patch, LUCENE-2378_UnInvertedField.patch, 
> TestEnumPerf.java
>
>
> A number of places still use the pre-flex APIs.
> This is actually healthy, since it gives us ongoing testing of the back 
> compat emulation layer.
> But we should at some point cut them all over to flex.  Latest we can do this 
> is 4.0, but I'm not sure we should do them all for 3.1... still marking this 
> as 3.1 to "remind us" :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1885) StreamingUpdateSolrServer hangs

2010-06-16 Thread Erik Hetzner (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hetzner updated SOLR-1885:
---

Attachment: (was: Merritt-object-modeling-v05.doc)

> StreamingUpdateSolrServer hangs
> ---
>
> Key: SOLR-1885
> URL: https://issues.apache.org/jira/browse/SOLR-1885
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
> Fix For: Next
>
> Attachments: stream_release_connection_fix.diff, TestSolrJPerf.java
>
>
> Looks like we may still have a hanging issue:
> http://search.lucidimagination.com/search/document/90c4a942e18ad572/streamingupdatesolrserver_hangs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1885) StreamingUpdateSolrServer hangs

2010-06-16 Thread Erik Hetzner (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hetzner updated SOLR-1885:
---

Attachment: stream_release_connection_fix.diff

> StreamingUpdateSolrServer hangs
> ---
>
> Key: SOLR-1885
> URL: https://issues.apache.org/jira/browse/SOLR-1885
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
> Fix For: Next
>
> Attachments: stream_release_connection_fix.diff, TestSolrJPerf.java
>
>
> Looks like we may still have a hanging issue:
> http://search.lucidimagination.com/search/document/90c4a942e18ad572/streamingupdatesolrserver_hangs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1885) StreamingUpdateSolrServer hangs

2010-06-16 Thread Erik Hetzner (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hetzner updated SOLR-1885:
---

Attachment: Merritt-object-modeling-v05.doc

> StreamingUpdateSolrServer hangs
> ---
>
> Key: SOLR-1885
> URL: https://issues.apache.org/jira/browse/SOLR-1885
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
> Fix For: Next
>
> Attachments: stream_release_connection_fix.diff, TestSolrJPerf.java
>
>
> Looks like we may still have a hanging issue:
> http://search.lucidimagination.com/search/document/90c4a942e18ad572/streamingupdatesolrserver_hangs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1885) StreamingUpdateSolrServer hangs

2010-06-16 Thread Erik Hetzner (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879527#action_12879527
 ] 

Erik Hetzner commented on SOLR-1885:


I have been reliably encountering this error; however it takes me a day or two 
to get there.

I have attached a change which may fix this - I will let you know in day or 
two. This change ensures that a call is made to method.releaseConnection() for 
every call to executeMethod(method), 
[http://hc.apache.org/httpclient-3.x/threading.html]. The current version of 
StreamingUpdateSolrServer only calls releaseConnection once for possibly more 
than one update call (the posts are wrapped in a do...while block).



> StreamingUpdateSolrServer hangs
> ---
>
> Key: SOLR-1885
> URL: https://issues.apache.org/jira/browse/SOLR-1885
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
> Fix For: Next
>
> Attachments: TestSolrJPerf.java
>
>
> Looks like we may still have a hanging issue:
> http://search.lucidimagination.com/search/document/90c4a942e18ad572/streamingupdatesolrserver_hangs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2378) Cutover remaining usage of pre-flex APIs

2010-06-16 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879521#action_12879521
 ] 

Yonik Seeley edited comment on LUCENE-2378 at 6/16/10 5:52 PM:
---

Attaching TestEnumPerf.java which tests the performance of iterating over all 
of the terms in an index.

indexing in trunk is ~19% faster
enumerating in trunk is ~9% slower

Java6 -server
Params: 100 999 10 100
Which is 1M docs, maxBufferedDocs=999, unique terms=100,000
(this results in 21 segments)

I haven't tried enumerating docs yet... that's up next.



  was (Author: ysee...@gmail.com):
Attaching TestEnumPerf.java which tests the performance of iterating over 
all of the terms in an index.

indexing in trunk is ~19% faster
enumerating in trunk is ~9% slower

I haven't tried enumerating docs yet... that's up next.
  
> Cutover remaining usage of pre-flex APIs
> 
>
> Key: LUCENE-2378
> URL: https://issues.apache.org/jira/browse/LUCENE-2378
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2378.patch, LUCENE-2378.patch, 
> LUCENE-2378_FileFloatSource.patch, LUCENE-2378_UnInvertedField.patch, 
> TestEnumPerf.java
>
>
> A number of places still use the pre-flex APIs.
> This is actually healthy, since it gives us ongoing testing of the back 
> compat emulation layer.
> But we should at some point cut them all over to flex.  Latest we can do this 
> is 4.0, but I'm not sure we should do them all for 3.1... still marking this 
> as 3.1 to "remind us" :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2378) Cutover remaining usage of pre-flex APIs

2010-06-16 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-2378:
-

Attachment: TestEnumPerf.java

Attaching TestEnumPerf.java which tests the performance of iterating over all 
of the terms in an index.

indexing in trunk is ~19% faster
enumerating in trunk is ~9% slower

I haven't tried enumerating docs yet... that's up next.

> Cutover remaining usage of pre-flex APIs
> 
>
> Key: LUCENE-2378
> URL: https://issues.apache.org/jira/browse/LUCENE-2378
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2378.patch, LUCENE-2378.patch, 
> LUCENE-2378_FileFloatSource.patch, LUCENE-2378_UnInvertedField.patch, 
> TestEnumPerf.java
>
>
> A number of places still use the pre-flex APIs.
> This is actually healthy, since it gives us ongoing testing of the back 
> compat emulation layer.
> But we should at some point cut them all over to flex.  Latest we can do this 
> is 4.0, but I'm not sure we should do them all for 3.1... still marking this 
> as 3.1 to "remind us" :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2010-06-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879495#action_12879495
 ] 

Michael McCandless commented on LUCENE-2501:


bq. also, if this boils down to a synchronization error of some sort, the extra 
file io done to write the trace info to disk may add some implicit 
synchronization/slowdown that may result in not being able to reproduce the 
issue

Ahh yes, the Heisenbug 
(http://en.wikipedia.org/wiki/Unusual_software_bug#Heisenbug).  Still it's 
worth a shot to see if we can catch it in action...

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: velocity response writer breaks portability

2010-06-16 Thread Chris Hostetter


: > +0 ... i'd prefer to keep the solr.war as lite as possible, with as few
: > dependencies as possible -- so alowing for lazy loaded respose writers
: > seems like a better choice to me -- but i recognize that there are
: > size/feature trade offs.
: 
: There are advantages though - being able to replace the JSP pages with
: something more malleable and potentially overridable.   And it's really not
: that heavy weight.

absolutely -- there are always trade offs, hence i am not opposed 

(although to repeat an early comment from jira: i'd still rather move 
towards the Admin UI being impled entirely with request handler XML 
response and browser side XSLT then with velocity -- that way we can be 
*sure* that our request handlers are exposing *everything* needed to power 
the admin UI, so it's all available programaticly as well)



-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2010-06-16 Thread Tim Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879489#action_12879489
 ] 

Tim Smith commented on LUCENE-2501:
---

will do

may take some time before it occurs again

also, if this boils down to a synchronization error of some sort, the extra 
file io done to write the trace info to disk may add some implicit 
synchronization/slowdown that may result in not being able to reproduce the 
issue (i've seen this occur on non-lucene related synchronization issues, add 
the extra debug logging and it never fails anymore)

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: velocity response writer breaks portability

2010-06-16 Thread Erik Hatcher



On Jun 16, 2010, at 4:23 PM, Chris Hostetter wrote:
: It's always been my intention to move VrW to Solr core.  Any  
objections to

: that?
: And in response to rmuir's question - it definitely makes for a  
good demo of
: Solr, though I don't think it is the right demo of low-level  
Lucene usage.

: But I'm all for VrW becoming more of a first class citizen.
: Everyone ok with VrW moving to Solr core?  I'll take care of it if  
desired.


+0 ... i'd prefer to keep the solr.war as lite as possible, with as  
few

dependencies as possible -- so alowing for lazy loaded respose writers
seems like a better choice to me -- but i recognize that there are
size/feature trade offs.


There are advantages though - being able to replace the JSP pages with  
something more malleable and potentially overridable.   And it's  
really not that heavy weight.


DIH is one of those contribs that injects itself in for UI purposes,  
and that's kinda ugly.  I think with VrW in core we could extract that  
and plugins could have their own UI in the JAR files (or file system).


Erik


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr 1.4.1 news entry

2010-06-16 Thread Chris Hostetter

: Anyone got any opinions on how we write the news entry? I'm not seeing a
: previous bug fix release for Solr to go by.

we havne't had one before.

: Personally, I'm thinking short and sweet unless I get some suggestions:

i think you're on the money -- the nature of a bug fix release is that 
there aren't any features to hype -- at best we could hype the number of 
bugs fixed, but it would be bad to single out specific bugs because people 
might only upgrade if they think those bugs affect them (and not notice 
that other less-hyped bugs in CHANGES.txt do affect them)




-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r954674 - /lucene/java/dist/KEYS

2010-06-16 Thread Chris Hostetter


: It shouldn't hold a key you don't trust 100%. Presumably the very old
: committers are still happy that their keys are in their possession.

 * if you feel that people can still trust releases signed with that key, 
   then it should stay in the KEYS file.
 * If you don't feel that people can still trust releases signed with that 
   key, then removing that key from KEYS isn't enough -- those releases 
   should also be verified and resigned with a key that can be trusted.

...that is the theory anyway, as i understand it.  Wether it's actaully 
worth re-signing old releases (as a practical issue) is another matter.


-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2010-06-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879487#action_12879487
 ] 

Michael McCandless commented on LUCENE-2501:


Can you capture IW.setInfoStream output leading up to it?

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2010-06-16 Thread Tim Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879486#action_12879486
 ] 

Tim Smith commented on LUCENE-2501:
---

ram buffer size is set to 64.0

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: velocity response writer breaks portability

2010-06-16 Thread Chris Hostetter


: Why is Velocity any different than clustering or Solr Cell in this regard?
: Doesn't moving the example directory affect them too?

that was yonik's point about lazy loading -- it doesn't exist for response 
writers, so if you copy/move the example configs solr won't even start w/o 
the velocity contrib (but for hte other contribs you only get a failure if 
you try to use them)

: It's always been my intention to move VrW to Solr core.  Any objections to
: that?
: And in response to rmuir's question - it definitely makes for a good demo of
: Solr, though I don't think it is the right demo of low-level Lucene usage.
: But I'm all for VrW becoming more of a first class citizen.
: Everyone ok with VrW moving to Solr core?  I'll take care of it if desired.

+0 ... i'd prefer to keep the solr.war as lite as possible, with as few 
dependencies as possible -- so alowing for lazy loaded respose writers 
seems like a better choice to me -- but i recognize that there are 
size/feature trade offs.


-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2010-06-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879485#action_12879485
 ] 

Michael McCandless commented on LUCENE-2501:


Are you certain about IW's RAM buffer size?  If the RAM buffer size was close 
to 2GB it could lead to exceptions like this.

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2010-06-16 Thread Tim Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879483#action_12879483
 ] 

Tim Smith commented on LUCENE-2501:
---

Looks like this may be the original source of the errors

{code}
Caused by: org.apache.lucene.index.CorruptIndexException: docs out of order 
(607 <= 607 )
at 
org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:76)
at 
org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:209)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:127)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:144)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:72)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:64)
at 
org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:583)
at 
org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:3602)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3511)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3502)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2103)
{code}

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Reopened: (LUCENE-2378) Cutover remaining usage of pre-flex APIs

2010-06-16 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-2378:



Reopening to make sure we get to the bottom of the perf loss...

> Cutover remaining usage of pre-flex APIs
> 
>
> Key: LUCENE-2378
> URL: https://issues.apache.org/jira/browse/LUCENE-2378
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2378.patch, LUCENE-2378.patch, 
> LUCENE-2378_FileFloatSource.patch, LUCENE-2378_UnInvertedField.patch
>
>
> A number of places still use the pre-flex APIs.
> This is actually healthy, since it gives us ongoing testing of the back 
> compat emulation layer.
> But we should at some point cut them all over to flex.  Latest we can do this 
> is 4.0, but I'm not sure we should do them all for 3.1... still marking this 
> as 3.1 to "remind us" :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2378) Cutover remaining usage of pre-flex APIs

2010-06-16 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879459#action_12879459
 ] 

Yonik Seeley edited comment on LUCENE-2378 at 6/16/10 3:02 PM:
---

OK, I tested UnInvertedField faceting on branch_3x vs trunk:

phase1 building the UnInvertedField (involves iterating all terms and docs for 
a field): trunk is 31% slower
complete facet request, including returning top 1000 facets (exercises 
NumberedTermsEnum - seeking + iterating over terms): trunk is 10% slower.

  was (Author: ysee...@gmail.com):
OK, I tested UnInvertedField faceting on branch_3x vs trunk:

phase1 building the UnInvertedField (involves iterating all terms and docs for 
a field): trunk is 31% slower
returning top 1000 facets (exercises NumberedTermsEnum - seeking + iterating 
over terms): trunk is 10% slower.
  
> Cutover remaining usage of pre-flex APIs
> 
>
> Key: LUCENE-2378
> URL: https://issues.apache.org/jira/browse/LUCENE-2378
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2378.patch, LUCENE-2378.patch, 
> LUCENE-2378_FileFloatSource.patch, LUCENE-2378_UnInvertedField.patch
>
>
> A number of places still use the pre-flex APIs.
> This is actually healthy, since it gives us ongoing testing of the back 
> compat emulation layer.
> But we should at some point cut them all over to flex.  Latest we can do this 
> is 4.0, but I'm not sure we should do them all for 3.1... still marking this 
> as 3.1 to "remind us" :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCY-111) Matcher

2010-06-16 Thread Marvin Humphrey (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCY-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marvin Humphrey updated LUCY-111:
-

Attachment: Matcher.bp
Matcher.c

> Matcher
> ---
>
> Key: LUCY-111
> URL: https://issues.apache.org/jira/browse/LUCY-111
> Project: Lucy
>  Issue Type: New Feature
>  Components: Core - Search
>Reporter: Marvin Humphrey
>Assignee: Marvin Humphrey
>Priority: Blocker
> Attachments: Matcher.bp, Matcher.c
>
>
> A Matcher is an object which matches a set of Lucy doc ids, iterating over
> them via Next() and Advance().  It combines the roles of Lucene's
> DocIdSetIterator and Scorer classes.
> Some -- but not all -- Matchers implement a Score() method.  We can refer to
> such Matchers informally as "scorers", but Lucy won't need a Scorer class a la
> Lucene.   In Lucy, Query classes will compile down to Matchers that either
> Score() or don't.  This allows us to perform optimizations on branches of
> compound scorers: compiling "foo AND NOT bar" will produce a scoring Matcher
> for "foo" and a non-scoring Matcher for "bar", since the "bar" branch can
> never contribute to the score.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENE-2378) Cutover remaining usage of pre-flex APIs

2010-06-16 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879459#action_12879459
 ] 

Yonik Seeley commented on LUCENE-2378:
--

OK, I tested UnInvertedField faceting on branch_3x vs trunk:

phase1 building the UnInvertedField (involves iterating all terms and docs for 
a field): trunk is 31% slower
returning top 1000 facets (exercises NumberedTermsEnum - seeking + iterating 
over terms): trunk is 10% slower.

> Cutover remaining usage of pre-flex APIs
> 
>
> Key: LUCENE-2378
> URL: https://issues.apache.org/jira/browse/LUCENE-2378
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2378.patch, LUCENE-2378.patch, 
> LUCENE-2378_FileFloatSource.patch, LUCENE-2378_UnInvertedField.patch
>
>
> A number of places still use the pre-flex APIs.
> This is actually healthy, since it gives us ongoing testing of the back 
> compat emulation layer.
> But we should at some point cut them all over to flex.  Latest we can do this 
> is 4.0, but I'm not sure we should do them all for 3.1... still marking this 
> as 3.1 to "remind us" :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2502) Remove some unused code in Surround query parser

2010-06-16 Thread Paul Elschot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879457#action_12879457
 ] 

Paul Elschot edited comment on LUCENE-2502 at 6/16/10 2:55 PM:
---

Remove getTermsEnum method from SpanNearClauseFactory.
The patch was generated from an svn diff in the contrib directory.

  was (Author: paul.elsc...@xs4all.nl):
Remove getTermsEnum method from SpanNearClauseFactory
  
> Remove some unused code in Surround query parser
> 
>
> Key: LUCENE-2502
> URL: https://issues.apache.org/jira/browse/LUCENE-2502
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 4.0
>Reporter: Paul Elschot
>Priority: Trivial
> Fix For: 4.0
>
> Attachments: LUCENE-2502.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2502) Remove some unused code in Surround query parser

2010-06-16 Thread Paul Elschot (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-2502:
-

Attachment: LUCENE-2502.patch

Remove getTermsEnum method from SpanNearClauseFactory

> Remove some unused code in Surround query parser
> 
>
> Key: LUCENE-2502
> URL: https://issues.apache.org/jira/browse/LUCENE-2502
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 4.0
>Reporter: Paul Elschot
>Priority: Trivial
> Fix For: 4.0
>
> Attachments: LUCENE-2502.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2502) Remove some unused code in Surround query parser

2010-06-16 Thread Paul Elschot (JIRA)

Remove some unused code in Surround query parser


 Key: LUCENE-2502
 URL: https://issues.apache.org/jira/browse/LUCENE-2502
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 4.0
Reporter: Paul Elschot
Priority: Trivial
 Fix For: 4.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1954) Highlighter component should expose snippet character offsets and the score.

2010-06-16 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-1954:
---

Attachment: SOLR-1954_start_and_end_offsets.patch

Implementation for the default highlighter.  Includes a basic test.

> Highlighter component should expose snippet character offsets and the score.
> 
>
> Key: SOLR-1954
> URL: https://issues.apache.org/jira/browse/SOLR-1954
> Project: Solr
>  Issue Type: New Feature
>  Components: highlighter
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-1954_start_and_end_offsets.patch
>
>
> The Highlighter Component does not currently expose the snippet character 
> offsets nor the score.  There is a TODO in DefaultSolrHighlighter indicating 
> the intention to add this eventually.  This information is needed when doing 
> highlighting on external content.  The data is there so its pretty easy to 
> output it in some way.  The challenge is deciding on the output and its 
> ramifications on backwards compatibility.  The current highlighter component 
> response structure doesn't lend itself to adding any new data, unfortunately. 
>  I wish the original implementer had some foresight.  Unfortunately all the 
> highlighting tests assume this structure.  Here is a snippet of the current 
> response structure in Solr's sample data searching for "sdram" for reference:
> {code:xml}
> 
>  
>   
>   CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM 
> Unbuffered DDR 400 (PC 3200) System Memory - Retail
>   
>  
> 
> {code}
> Perhaps as a little hack, we introduce a pseudo field called 
> text_startCharOffset which is the concatenation of the matching field and 
> "_startCharOffset".  This would be an array of ints.  Likewise, there would 
> be another array for endCharOffset and score.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCY-111) Matcher

2010-06-16 Thread Marvin Humphrey (JIRA)

Matcher
---

 Key: LUCY-111
 URL: https://issues.apache.org/jira/browse/LUCY-111
 Project: Lucy
  Issue Type: New Feature
  Components: Core - Search
Reporter: Marvin Humphrey
Assignee: Marvin Humphrey
Priority: Blocker


A Matcher is an object which matches a set of Lucy doc ids, iterating over
them via Next() and Advance().  It combines the roles of Lucene's
DocIdSetIterator and Scorer classes.

Some -- but not all -- Matchers implement a Score() method.  We can refer to
such Matchers informally as "scorers", but Lucy won't need a Scorer class a la
Lucene.   In Lucy, Query classes will compile down to Matchers that either
Score() or don't.  This allows us to perform optimizations on branches of
compound scorers: compiling "foo AND NOT bar" will produce a scoring Matcher
for "foo" and a non-scoring Matcher for "bar", since the "bar" branch can
never contribute to the score.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENE-2410) Optimize PhraseQuery

2010-06-16 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2410:
---

Attachment: LUCENE-2410.patch

Attached initial rough patch, doing the 1st and 3rd bullets above.
Still many nocommits, but all tests pass.

I only did this for the exact case (I don't understand the sloppy
case!), so I modified ExactPhraseScorer to no longer subclass
PhraseScorer and instead do everything on its own.

I tested on a 20M doc Wikipedia index, best of 10 runs:

||Query||No. hits||Trunk QPS||Patch QPS||Speedup||
|United States|314K|4.29|11.04|2.6X faster|
|United Kingdom Parliament|7K|20.33|58.57|2.9X faster|

The speedup is great :)

However, there's one problem w/ the patch that I must fix (and will
bring these gains down), which is it requires 2 int arrays sized to
the max position encountered during the search (which for a large doc
could be very large).  I think to make this committable I'd have to
switch to processing the positions in chunks (like BooleanScorer).


> Optimize PhraseQuery
> 
>
> Key: LUCENE-2410
> URL: https://issues.apache.org/jira/browse/LUCENE-2410
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2410.patch, LUCENE-2410_rewrite.patch
>
>
> Looking the scorers for PhraseQuery, I think there are some speedups
> we could do:
>   * The AND part of the scorer (which advances to the next doc that
> has all the terms), in PhraseScorer.doNext, should do the same
> optimizing as BooleanQuery's ConjunctionScorer, ie sort terms from
> rarest to most frequent.  I don't think it should use a linked
> list/firstToLast() that it does today.
>   * We do way too much work now when .score() is not called, because
> we go and find all occurrences of the phrase in the doc, whereas
> we should stop only after finding the first and then go and count
> the rest if .score() is called.
>   * For the exact case, I think we can use two int arrays to find the
> matches.  The first array holds the count of how many times a term
> in the phrase "matched" a phrase starting at that position.  When
> that count == the number of terms in the phrase, it's a match.
> The 2nd is a "gen" array (holds docID when that count was last
> touched), to avoid clearing.  Ie when incrementing the count, if
> the docID != gen, we reset count to 0.  I think this'd be faster
> than the PQ we now use.  Downside of this is if you have immense
> docs (position gets very large) we'd need 2 immense arrays.
> It'd be great to do LUCENE-1252 along with this, ie factor
> PhraseScorer into two AND'd sub-scorers (LUCENE-1252 is open for
> this).  The first one should be ConjunctionScorer, and the 2nd one
> checks the positions (ie, either the exact or sloppy scorers).  This
> would mean if the PhraseQuery is AND'd w/ other clauses (or, a filter
> is applied) we would save CPU by not checking the positions for a doc
> unless all other AND'd clauses accepted the doc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: custom indexing

2010-06-16 Thread John Wang

Awesome! Thanks Michael!

-John

On Wed, Jun 16, 2010 at 7:53 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> On Wed, Jun 16, 2010 at 10:30 AM, John Wang  wrote:
> > Thanks Michael!
> > For 1), I only see the api to get the uniqueTerms for the entire reader,
> not
> > for a specific field. Am I looking at the wrong place?
>
> Ahh sorry I missed that you need it per-field.  Yes, flex now makes it
> possible.  If the reader is composite, do this:
>
>  MultiFields.getTerms(reader, field).getUniqueTermCount();
>
> else (definitely a single segment):
>
>  reader.fields().terms(field).getUniqueTermCount()
>
> (But you should null-check the returned Fields (in case reader has no
> fields) and Terms (in case the specified field does not exist)).
>
> > 2) Awesome!!! Is there a wiki on flex indexing somewhere?
>
> There's a start at http://wiki.apache.org/lucene-java/FlexibleIndexing
>
> But it doesn't document in detail how to make your own Codec --
> probably simplest way to get started is look @ the core Codecs.
>
> Mike
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

[jira] Commented: (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2010-06-16 Thread Tim Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879422#action_12879422
 ] 

Tim Smith commented on LUCENE-2501:
---

Some more info:

ingestion is being performed in multiple threads

ArrayIndexOutOfBounds exception is occurring in bursts
I suspect that these bursts of exceptions stop after the next commit (at which 
point the buffers are all reset) 
NOTE: i have not yet confirmed this, but i suspect it


> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2010-06-16 Thread Tim Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879403#action_12879403
 ] 

Tim Smith commented on LUCENE-2501:
---

Here's all the info i have available right now (will try to get more):

16 core, 18-gig ram Windows 7 machine
1 JVM
16 index writers (each using default settings (64M ram, etc))
300+ docs/sec ingestion (small documents)
commit every 10 minutes
optimize every hour

The report i got indicated that every now and then one of these 
ArrayIndexOutOfBounds exceptions would occur
this would result in the document being indexed failing, but otherwise things 
would continue normally


> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2010-06-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879399#action_12879399
 ] 

Michael McCandless commented on LUCENE-2501:


What sized RAM buffer was being used for IW when this exception happened?

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2010-06-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879389#action_12879389
 ] 

Michael McCandless commented on LUCENE-2501:


Is this issue repeatable, on a different machine?

We do have a randomized test for this (TestByteSlices) -- I'll go start it w/ 
big random.multiplier.  Maybe it can uncover this ;)

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: velocity response writer breaks portability

2010-06-16 Thread Erik Hatcher

For some reason I didn't get Yonik's original message today, but I  
spotted elsewhere -


In solr 3x, the velocity response writer was added to the example  
config.

Unfortunately, this breaks the ability to move the "example" server
somewhere else and have it work.

Here are some options to fix this off the top of my head:
a) comment out velocity / move it out of the example
b) move the needed velocity jars inside the solr war or the example  
lib

c) make a missing response writer not a critical error
d) implement lazy response writers like we do lazy request handlers so
that there is only an error if use is attempted

Thoughts?

-Yonik


Why is Velocity any different than clustering or Solr Cell in this  
regard?  Doesn't moving the example directory affect them too?
It's always been my intention to move VrW to Solr core.  Any  
objections to that?
And in response to rmuir's question - it definitely makes for a good  
demo of Solr, though I don't think it is the right demo of low-level  
Lucene usage.  But I'm all for VrW becoming more of a first class  
citizen.
Everyone ok with VrW moving to Solr core?  I'll take care of it if  
desired.

Erik
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1949) overwrite document fails if Solr index is not optimized

2010-06-16 Thread Miguel B. (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879385#action_12879385
 ] 

Miguel B. commented on SOLR-1949:
-

We are running a lot of test and now we can't get the same error. At this point 
I can say that the issue don't exists, so we don't use expungeDeletes=true. We 
don't know what really happened, it would be resolved because we removed all 
data directory before run new tests. 

We continue with tests.

My apologies.



> overwrite document fails if Solr index is not optimized
> ---
>
> Key: SOLR-1949
> URL: https://issues.apache.org/jira/browse/SOLR-1949
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.4
> Environment: linux centos
>Reporter: Miguel B.
>
> Scenario:
> - Solr 1.4 with multicore
> - We have a set of 5.000 source documents that we want to index.
> - We send these set to Solr by SolrJ API and they are added correctly. We 
> have field ID as string and uniqueKey, so the update operation overwite 
> documents with the same ID. The result is 4500 unique documents in Solr. Also 
> all documents have an index field that contains the source repository of each 
> document, we need it because we want to index another sources.
> - After add operation, we send optimization.
>  
> All works fine at this point.  Solr have 4.500 documents at Solr core (and 
> 4.500 max documents too).
>  
> Now these 5.000 sources documents are updated by users, and a set of them are 
> deleted (supose, 1000). So, now we want to update our Solr index with these 
> change (unfortunately our repository doesn't support an incremental 
> approach), the operations are:
>  
>  - At index Solr, delete documents by query  (by the field that contains 
> document source repository). We use deleteByQuery and commit SolrJ operations.
>  - At this point Solr core have 0 documents (but 4.500 max documents, 
> important!!!)
>  - Now we add to Solr the new version of source documents  (4000). Remember 
> that documents don't have unique identifiers, supose that unique items are 
> 3000. So when add operation finish (after commit sended) Solr index must have 
> 3.000 unique items.
>  
> But the result isn't 3.000 unique items, we obtains a random results: 3000, 
> 2980, 2976, etc. It's a serious problem because we lost documents.
> We have a workaround. At these operations just after delete operation, we 
> send an optimization to Solr (maxDocuments are updated). After this, we send 
> new documents. By this way the result is always fine.
> In our tests, we can see that this issue is only when the new documents 
> overwrites documents that existed in solr.
> Thanks!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2010-06-16 Thread Tim Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879382#action_12879382
 ] 

Tim Smith commented on LUCENE-2501:
---

thats what i was afraid of

i got this report second hand, so i don't have access to the data that was 
being ingested

and i currently don't know enough about this section of the indexing code to 
guess in order to create a unit test
i'll try to create a test, but i expect it will be difficult (especially if no 
one else has ever seen this)

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2010-06-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879379#action_12879379
 ] 

Michael McCandless commented on LUCENE-2501:


Hmmm, not good.  Can you boil this down to a smallish test case?

level should never be > 9, because nextLevelArray[*] is no greater than 9.  
Something more serious is up...

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2010-06-16 Thread Tim Smith (JIRA)

ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
--

 Key: LUCENE-2501
 URL: https://issues.apache.org/jira/browse/LUCENE-2501
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.1
Reporter: Tim Smith


I'm seeing the following exception during indexing:
{code}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
at 
org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
at 
org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
at 
org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
at 
org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
at 
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
... 37 more
{code}


This seems to be caused by the following code:
{code}
final int level = slice[upto] & 15;
final int newLevel = nextLevelArray[level];
final int newSize = levelSizeArray[newLevel];
{code}

this can result in "level" being a value between 0 and 14
the array nextLevelArray is only of size 10

i suspect the solution would be to either max the level to 10, or to add more 
entries to the nextLevelArray so it has 15 entries
however, i don't know if something more is going wrong here and this is just 
where the exception hits from a deeper issue



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: velocity response writer breaks portability

2010-06-16 Thread Robert Muir

can velocity be moved to solr core?

furthermore can the velocity + solr example replace the lucene "demo". ?


On Wed, Jun 16, 2010 at 11:32 AM, Yonik Seeley
wrote:

> In solr 3x, the velocity response writer was added to the example config.
> Unfortunately, this breaks the ability to move the "example" server
> somewhere else and have it work.
>
> Here are some options to fix this off the top of my head:
> a) comment out velocity / move it out of the example
> b) move the needed velocity jars inside the solr war or the example lib
> c) make a missing response writer not a critical error
> d) implement lazy response writers like we do lazy request handlers so
> that there is only an error if use is attempted
>
> Thoughts?
>
> -Yonik
> http://www.lucidimagination.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-- 
Robert Muir
rcm...@gmail.com

velocity response writer breaks portability

2010-06-16 Thread Yonik Seeley

In solr 3x, the velocity response writer was added to the example config.
Unfortunately, this breaks the ability to move the "example" server
somewhere else and have it work.

Here are some options to fix this off the top of my head:
a) comment out velocity / move it out of the example
b) move the needed velocity jars inside the solr war or the example lib
c) make a missing response writer not a critical error
d) implement lazy response writers like we do lazy request handlers so
that there is only an error if use is attempted

Thoughts?

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2378) Cutover remaining usage of pre-flex APIs

2010-06-16 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2378.


Resolution: Fixed

Woops -- see LUCENE-2380 for the commit (I typed the wrong issue).

> Cutover remaining usage of pre-flex APIs
> 
>
> Key: LUCENE-2378
> URL: https://issues.apache.org/jira/browse/LUCENE-2378
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2378.patch, LUCENE-2378.patch, 
> LUCENE-2378_FileFloatSource.patch, LUCENE-2378_UnInvertedField.patch
>
>
> A number of places still use the pre-flex APIs.
> This is actually healthy, since it gives us ongoing testing of the back 
> compat emulation layer.
> But we should at some point cut them all over to flex.  Latest we can do this 
> is 4.0, but I'm not sure we should do them all for 3.1... still marking this 
> as 3.1 to "remind us" :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2378) Cutover remaining usage of pre-flex APIs

2010-06-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879354#action_12879354
 ] 

Michael McCandless commented on LUCENE-2378:


Great!  I'll commit now...

> Cutover remaining usage of pre-flex APIs
> 
>
> Key: LUCENE-2378
> URL: https://issues.apache.org/jira/browse/LUCENE-2378
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2378.patch, LUCENE-2378.patch, 
> LUCENE-2378_FileFloatSource.patch, LUCENE-2378_UnInvertedField.patch
>
>
> A number of places still use the pre-flex APIs.
> This is actually healthy, since it gives us ongoing testing of the back 
> compat emulation layer.
> But we should at some point cut them all over to flex.  Latest we can do this 
> is 4.0, but I'm not sure we should do them all for 3.1... still marking this 
> as 3.1 to "remind us" :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: custom indexing

2010-06-16 Thread Michael McCandless

On Wed, Jun 16, 2010 at 10:30 AM, John Wang  wrote:
> Thanks Michael!
> For 1), I only see the api to get the uniqueTerms for the entire reader, not
> for a specific field. Am I looking at the wrong place?

Ahh sorry I missed that you need it per-field.  Yes, flex now makes it
possible.  If the reader is composite, do this:

  MultiFields.getTerms(reader, field).getUniqueTermCount();

else (definitely a single segment):

  reader.fields().terms(field).getUniqueTermCount()

(But you should null-check the returned Fields (in case reader has no
fields) and Terms (in case the specified field does not exist)).

> 2) Awesome!!! Is there a wiki on flex indexing somewhere?

There's a start at http://wiki.apache.org/lucene-java/FlexibleIndexing

But it doesn't document in detail how to make your own Codec --
probably simplest way to get started is look @ the core Codecs.

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2378) Cutover remaining usage of pre-flex APIs

2010-06-16 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-2378:
-

Attachment: LUCENE-2378_UnInvertedField.patch

Here's the same sort of patch to UnInvertedField - it avoids the String based 
conversion and thus will work better if any terms are true binary.

I think we're good to commit!
I will still do some performance tests - but that shouldn't hold this up IMO.

> Cutover remaining usage of pre-flex APIs
> 
>
> Key: LUCENE-2378
> URL: https://issues.apache.org/jira/browse/LUCENE-2378
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2378.patch, LUCENE-2378.patch, 
> LUCENE-2378_FileFloatSource.patch, LUCENE-2378_UnInvertedField.patch
>
>
> A number of places still use the pre-flex APIs.
> This is actually healthy, since it gives us ongoing testing of the back 
> compat emulation layer.
> But we should at some point cut them all over to flex.  Latest we can do this 
> is 4.0, but I'm not sure we should do them all for 3.1... still marking this 
> as 3.1 to "remind us" :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: custom indexing

2010-06-16 Thread John Wang

Thanks Michael!

For 1), I only see the api to get the uniqueTerms for the entire reader, not
for a specific field. Am I looking at the wrong place?

2) Awesome!!! Is there a wiki on flex indexing somewhere?

-John

On Wed, Jun 16, 2010 at 2:37 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> 1) Yes, in fact you needn't wait for flex for this --
> IndexReader.getUniqueTermCount was added in 2.9.  But this will throw
> UOE on composite readers (Multi/DirReader).
>
> 2) Yes, you can make a Codec that separately maintains your own files,
> both on initial flush and on merge.  Make sure your Codec.files()
> returns your new files, so IndexFileDeleter doesn't delete them!
>
> Mike
>
> On Tue, Jun 15, 2010 at 5:29 PM, John Wang  wrote:
> > Hi:
> > Great job on the flex indexing feature! This opens new doors on how
> an
> > application to lucene for its usecases.
> > I have coupla questions that I brought up before, the answer was to
> wait
> > for flex indexing. Now that flex indexing seems to be in a good shape, I
> > thought I'd bring it up again:
> > 1) Is it possible to obtain unique term count for a given field,
> > e.g. getUniqueTermCount(String field) on the segment reader?
> > 2) Is it possible to use Lucene's segment/merge mechanism to encode
> custom
> > segment files, my own StoredData format, or my own forward index some
> field
> > etc.?
> > Thanks
> > -John
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

[jira] Commented: (LUCENE-2413) Consolidate all (Solr's & Lucene's) analyzers into modules/analysis

2010-06-16 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879312#action_12879312
 ] 

Robert Muir commented on LUCENE-2413:
-

Thanks Steven, committed revision 955203 of your patch.

> Consolidate all (Solr's & Lucene's) analyzers into modules/analysis
> ---
>
> Key: LUCENE-2413
> URL: https://issues.apache.org/jira/browse/LUCENE-2413
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Michael McCandless
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2413-charfilter.patch, 
> LUCENE-2413-dir-and-package-fixes.patch, LUCENE-2413-PFAW+LF.patch, 
> LUCENE-2413_commongrams.patch, LUCENE-2413_coreAnalyzers.patch, 
> LUCENE-2413_coreUtils.patch, LUCENE-2413_folding.patch, 
> LUCENE-2413_htmlstrip.patch, LUCENE-2413_icu.patch, 
> LUCENE-2413_keep_hyphen_trim.patch, LUCENE-2413_keyword.patch, 
> LUCENE-2413_mockfilter.patch, LUCENE-2413_mockfilter.patch, 
> LUCENE-2413_pattern.patch, LUCENE-2413_porter.patch, 
> LUCENE-2413_removeDups.patch, LUCENE-2413_synonym.patch, 
> LUCENE-2413_teesink.patch, LUCENE-2413_test4.patch, 
> LUCENE-2413_testanalyzer.patch, LUCENE-2413_testanalyzer.patch, 
> LUCENE-2413_tests2.patch, LUCENE-2413_tests3.patch, LUCENE-2413_wdf.patch
>
>
> We've been wanting to do this for quite some time now...  I think, now that 
> Solr/Lucene are merged, and we're looking at opening an unstable line of 
> development for Solr/Lucene, now is the right time to do it.
> A standalone module for all analyzers also empowers apps to separately 
> version the analyzers from which version of Solr/Lucene they use, possibly 
> enabling us to remove Version entirely from the analyzers.
> We should also do LUCENE-2309 (decouple, as much as possible, indexer from 
> the analysis API), but I don't think that issue needs to block this 
> consolidation.
> Once we do this, there is one place where our users can find all the 
> analyzers that Solr/Lucene provide.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2378) Cutover remaining usage of pre-flex APIs

2010-06-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879291#action_12879291
 ] 

Michael McCandless commented on LUCENE-2378:


Thanks Yonik -- your changes to FileFloatSource look good!  I'll merge w/ my 
patch, and commit soon...

> Cutover remaining usage of pre-flex APIs
> 
>
> Key: LUCENE-2378
> URL: https://issues.apache.org/jira/browse/LUCENE-2378
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2378.patch, LUCENE-2378.patch, 
> LUCENE-2378_FileFloatSource.patch
>
>
> A number of places still use the pre-flex APIs.
> This is actually healthy, since it gives us ongoing testing of the back 
> compat emulation layer.
> But we should at some point cut them all over to flex.  Latest we can do this 
> is 4.0, but I'm not sure we should do them all for 3.1... still marking this 
> as 3.1 to "remind us" :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: custom indexing

2010-06-16 Thread Michael McCandless

1) Yes, in fact you needn't wait for flex for this --
IndexReader.getUniqueTermCount was added in 2.9.  But this will throw
UOE on composite readers (Multi/DirReader).

2) Yes, you can make a Codec that separately maintains your own files,
both on initial flush and on merge.  Make sure your Codec.files()
returns your new files, so IndexFileDeleter doesn't delete them!

Mike

On Tue, Jun 15, 2010 at 5:29 PM, John Wang  wrote:
> Hi:
>     Great job on the flex indexing feature! This opens new doors on how an
> application to lucene for its usecases.
>     I have coupla questions that I brought up before, the answer was to wait
> for flex indexing. Now that flex indexing seems to be in a good shape, I
> thought I'd bring it up again:
> 1) Is it possible to obtain unique term count for a given field,
> e.g. getUniqueTermCount(String field) on the segment reader?
> 2) Is it possible to use Lucene's segment/merge mechanism to encode custom
> segment files, my own StoredData format, or my own forward index some field
> etc.?
> Thanks
> -John

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2056) Should NIOFSDir use direct ByteBuffers?

2010-06-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879290#action_12879290
 ] 

Michael McCandless commented on LUCENE-2056:


Thanks Steven!  Was this with the above alg (ie, 4 threads doing searching)?

Could you also try the search using NIOFSDirectory?

Also, if possible, it'd be better to test against a larger index -- such 
super-fast queries allow the query init cost to unduly impact that results (eg, 
allocating a direct buffer is more costly than allocating a non-direct buffer).

> Should NIOFSDir use direct ByteBuffers?
> ---
>
> Key: LUCENE-2056
> URL: https://issues.apache.org/jira/browse/LUCENE-2056
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Store
>Reporter: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-2056.patch
>
>
> I'm trying to test NRT performance, and noticed when I dump the thread stacks 
> that the darned threads often seem to be in 
> {{java.nio.Bits.copyToByteArray(Native Method)}}... so I wondered whether we 
> could/should use direct ByteBuffers, and whether that would gain performance 
> in general.  We currently just use our own byte[] buffer via 
> BufferedIndexInput.
> It's hard to test since it's likely platform specific, but if it does result 
> in gains it could be an easy win.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Hudson build is back to normal : Solr-trunk #1181

2010-06-16 Thread Apache Hudson Server

See 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

75 matches

Mail list logo