[jira] Resolved: (LUCENE-1540) Improvements to contrib.benchmark for TREC collections

2011-02-06 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-1540.
-

Resolution: Fixed

ok no new failures, closing as fixed, Thanks Shai and Robert for your help here!

> Improvements to contrib.benchmark for TREC collections
> --
>
> Key: LUCENE-1540
> URL: https://issues.apache.org/jira/browse/LUCENE-1540
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Tim Armstrong
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-1540.patch, LUCENE-1540.patch, LUCENE-1540.patch, 
> LUCENE-1540.patch, trecdocs.zip
>
>
> The benchmarking utilities for  TREC test collections (http://trec.nist.gov) 
> are quite limited and do not support some of the variations in format of 
> older TREC collections.  
> I have been doing some benchmarking work with Lucene and have had to modify 
> the package to support:
> * Older TREC document formats, which the current parser fails on due to 
> missing document headers.
> * Variations in query format - newlines after  tag causing the query 
> parser to get confused.
> * Ability to detect and read in uncompressed text collections
> * Storage of document numbers by default without storing full text.
> I can submit a patch if there is interest, although I will probably want to 
> write unit tests for the new functionality first.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1395) Integrate Katta

2011-02-06 Thread JohnWu (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991279#comment-12991279
 ] 

JohnWu commented on SOLR-1395:
--

TomLiu:

as you said:QueryComponent returns DocSlice, but XMLWrite or EmbeddedServer 
returns SolrDocumentList from DocList.

I set the requestHandler to solr.MultiEmbeddedSearchHandler but the 
queryComponent still return the DocSlice.

Can you give me some advices?

Thanks!

JohnWu


> Integrate Katta
> ---
>
> Key: SOLR-1395
> URL: https://issues.apache.org/jira/browse/SOLR-1395
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
> Attachments: SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, 
> back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, 
> katta-solrcores.jpg, katta.node.properties, katta.zk.properties, 
> log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, 
> solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, 
> solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, 
> solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, 
> solr-1395-katta-0.6.2.patch, test-katta-core-0.6-dev.jar, 
> zkclient-0.1-dev.jar, zookeeper-3.2.1.jar
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> We'll integrate Katta into Solr so that:
> * Distributed search uses Hadoop RPC
> * Shard/SolrCore distribution and management
> * Zookeeper based failover
> * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2909) NGramTokenFilter may generate offsets that exceed the length of original text

2011-02-06 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned LUCENE-2909:
--

Assignee: Koji Sekiguchi

> NGramTokenFilter may generate offsets that exceed the length of original text
> -
>
> Key: LUCENE-2909
> URL: https://issues.apache.org/jira/browse/LUCENE-2909
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/analyzers
>Affects Versions: 2.9.4
>Reporter: Shinya Kasatani
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: TokenFilterOffset.patch
>
>
> Whan using NGramTokenFilter combined with CharFilters that lengthen the 
> original text (such as "ß" -> "ss"), the generated offsets exceed the length 
> of the origianal text.
> This causes InvalidTokenOffsetsException when you try to highlight the text 
> in Solr.
> While it is not possible to know the accurate offset of each character once 
> you tokenize the whole text with tokenizers like KeywordTokenizer, 
> NGramTokenFilter should at least avoid generating invalid offsets.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2909) NGramTokenFilter may generate offsets that exceed the length of original text

2011-02-06 Thread Shinya Kasatani (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinya Kasatani updated LUCENE-2909:


Attachment: TokenFilterOffset.patch

The patch that fixes the problem, including tests.

> NGramTokenFilter may generate offsets that exceed the length of original text
> -
>
> Key: LUCENE-2909
> URL: https://issues.apache.org/jira/browse/LUCENE-2909
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/analyzers
>Affects Versions: 2.9.4
>Reporter: Shinya Kasatani
>Priority: Minor
> Attachments: TokenFilterOffset.patch
>
>
> Whan using NGramTokenFilter combined with CharFilters that lengthen the 
> original text (such as "ß" -> "ss"), the generated offsets exceed the length 
> of the origianal text.
> This causes InvalidTokenOffsetsException when you try to highlight the text 
> in Solr.
> While it is not possible to know the accurate offset of each character once 
> you tokenize the whole text with tokenizers like KeywordTokenizer, 
> NGramTokenFilter should at least avoid generating invalid offsets.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2909) NGramTokenFilter may generate offsets that exceed the length of original text

2011-02-06 Thread Shinya Kasatani (JIRA)
NGramTokenFilter may generate offsets that exceed the length of original text
-

 Key: LUCENE-2909
 URL: https://issues.apache.org/jira/browse/LUCENE-2909
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.9.4
Reporter: Shinya Kasatani
Priority: Minor


Whan using NGramTokenFilter combined with CharFilters that lengthen the 
original text (such as "ß" -> "ss"), the generated offsets exceed the length of 
the origianal text.
This causes InvalidTokenOffsetsException when you try to highlight the text in 
Solr.

While it is not possible to know the accurate offset of each character once you 
tokenize the whole text with tokenizers like KeywordTokenizer, NGramTokenFilter 
should at least avoid generating invalid offsets.


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Keyword - search statistics

2011-02-06 Thread Selvaraj Varadharajan
 Hi

   Is there any way i can get 'no of times' a key word searched in SOLR ?


*Here is my solr package details*

Solr Specification Version: 1.4.0
Solr Implementation Version: 1.4.0 833479 - grantingersoll - 2009-11-06
12:33:40
Lucene Specification Version: 2.9.1
Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25

-Selvaraj


Re: Distributed Indexing

2011-02-06 Thread Alex Cowell
Hey,

We're making good progress, but our DistributedUpdateRequestHandler is
having a bit of an identity crisis, so we thought we'd ask what other
people's opinions are. The current situation is as follows:

We've added a method to ContentStreamHandlerBase to check if an update
request is distributed or not (based on the presence/validity of the
'shards' parameter). So a non-distributed request will proceed as normal but
a distributed request would be passed on to the
DistributedUpdateRequestHandler to deal with.

The reason this choice is made in the ContentStreamHandlerBase is so that
the DistributedUpdateRequestHandler can use the URL the request came in on
to determine where to distribute update requests. Eg. an update request is
sent to:
http://localhost:8983/solr/update/csv?shards=shard1,shard2...
then the DistributedUpdateRequestHandler knows to send requests to:
shard1/update/csv
shard2/update/csv

Alternatively, if the request wasn't distributed, it would simply be handled
by whichever request handler "/update/csv" uses.

Herein lies the problem. The DistributedUpdateRequestHandler is not really a
request handler in the same way as the CSVRequestHandler or
XmlUpdateRequestHandlers are. If anything, it's more like a "plugin" for the
various existing update request handlers, to allow them to deal with
distributed requests - a "distributor" if you will. It isn't designed to be
able to receive and handle requests directly.

We would like this "DistributedUpdateRequestHandler" to be defined in the
solrconfig to allow flexibility for setting up multiple different
DistributedUpdateRequestHandlers with different ShardDistributionPolicies
etc.and also to allow us to get the appropriate instance from the core in
the code. There seem to be two paths for doing this:

1. Leave it as an implementation of SolrRequestHandler and hope the user
doesn't directly send update requests to it (ie. a request to
http://localhost:8983/solr/ would most likely
cripple something). So it would be defined in the solrconfig something like:


2. Create a new plugin type for the solrconfig, say
"updateRequestDistributor" which would involve creating a new interface for
the DistributedUpdateRequestHandler to implement, then registering it with
the core. It would be defined in the solrconfig something like:

  
solr.HashedDistributionPolicy
  


This would mean that it couldn't directly receive requests, but that an
instance could still easily be retrieved from the core to handle the
distribution of update requests.

Any thoughts on the above issue (or a more succinct, descriptive name for
the class) are most welcome!

Alex


[jira] Issue Comment Edited: (SOLR-2341) Shard distribution policy

2011-02-06 Thread William Mayor (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991225#comment-12991225
 ] 

William Mayor edited comment on SOLR-2341 at 2/6/11 10:00 PM:
--

This patch makes the implemented policy deterministic. This is missing from the 
previous patch. The policy code has also been refactored into its own package.

  was (Author: williammayor):
This patch makes the implemented policy deterministic. This is missing from 
the previous patch. The policy code has also been refactored into it's own 
package.
  
> Shard distribution policy
> -
>
> Key: SOLR-2341
> URL: https://issues.apache.org/jira/browse/SOLR-2341
> Project: Solr
>  Issue Type: New Feature
>Reporter: William Mayor
>Priority: Minor
> Attachments: SOLR-2341.patch, SOLR-2341.patch
>
>
> A first crack at creating policies to be used for determining to which of a 
> list of shards a document should go. See discussion on "Distributed Indexing" 
> on dev-list.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Arabic Analyzer

2011-02-06 Thread Digy
Here is a port of lucene.java's arabic analyzer (
https://issues.apache.org/jira/browse/LUCENENET-392 )

You can safely remove nunit dependency and test cases from the project.

DIGY

-Original Message-
From: Ben Foster [mailto:b...@planetcloud.co.uk] 
Sent: Sunday, February 06, 2011 5:47 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: Arabic Analyzer

Is it still possible to use fixed term queries in Arabic (i.e. NOT using an
Analyzer)?

Thanks
Ben

On 6 February 2011 00:51, Prescott Nasser  wrote:

>
> Unfortunately, I don't think we have that. We're working on creating a new
> port of the java lucene code, but I don't know the timeline yet - I'm sure
> there will be a lot of chatter on this mailing list soon.
>
> ~Prescott
>
>
>
>
>
> 
> > Date: Sat, 5 Feb 2011 22:57:11 +
> > Subject: Arabic Analyzer
> > From: b...@planetcloud.co.uk
> > To: lucene-net-...@lucene.apache.org
> >
> > Is there an Arabic Analyzer available for Lucene.NET. I see there has
> been
> > one contributed to the Java project but wasn't sure if this has been
> ported.
> >
> > Thanks,
> >
> > Ben
>



-- 

Ben Foster

planetcloud
The Elms, Hawton
Newark-on-Trent
Nottinghamshire
NG24 3RL

www.planetcloud.co.uk



Re: Distributed Indexing

2011-02-06 Thread William Mayor
Hi

Good call about the policies being deterministic, should've thought of that
earlier.

We've changed the patch to include this and I've removed the random
assignment one (for obvious reasons).

Take a look and let me know what's to do. (
https://issues.apache.org/jira/browse/SOLR-2341)

Cheers

William

On Thu, Feb 3, 2011 at 5:00 PM, Upayavira  wrote:

>
>  On Thu, 03 Feb 2011 15:12 +, "Alex Cowell"  wrote:
>
> Hi all,
>
> Just a couple of questions that have arisen.
>
> 1. For handling non-distributed update requests (shards param is not
> present or is invalid), our code currently
>
>- assumes the user would like the data indexed, so gets the request
>handler assigned to "/update"
>- executes the request using core.execute() for the SolrCore associated
>with the original request
>
> Is this what we want it to do and is using core.execute() from within a
> request handler a valid method of passing on the update request?
>
>
> Take a look at how it is done in
> handler.component.SearchHandler.handleRequestBody(). I'd say try to follow
> as similar approach as possible. E.g. it is the SearchHandler that does much
> of the work, branching depending on whether it found a shards parameter.
>
>
> 2. We have partially implemented an update processor which actually
> generates and sends the split update requests to each specified shard (as
> designated by the policy). As it stands, the code shares a lot in common
> with the HttpCommComponent class used for distributed search. Should we look
> at "opening up" the HttpCommComponent class so it could be used by our
> request handler as well or should we continue with our current
> implementation and worry about that later?
>
>
> I agree that you are going to want to implement an UpdateRequestProcessor.
> However, it would seem to me that, unlike search, you're not going to want
> to bother with the existing processor and associated component chain, you're
> going to want to replace the processor with a distributed version.
>
> As to the HttpCommComponent, I'd suggest you make your own educated
> decision. How similar is the class? Could one serve both needs effectively?
>
>
> 3. Our update processor uses a MultiThreadedHttpConnectionManager to send
> parallel updates to shards, can anyone give some appropriate values to be
> used for the defaultMaxConnectionsPerHost and maxTotalConnections params?
> Won't the  values used for distributed search be a little high for
> distributed indexing?
>
>
> You are right, these will likely be lower for distributed indexing, however
> I'd suggest not worrying about it for now, as it is easy to tweak later.
>
> Upayavira
>
>  ---
> Enterprise Search Consultant at Sourcesense UK,
> Making Sense of Open Source
>


[jira] Updated: (SOLR-2341) Shard distribution policy

2011-02-06 Thread William Mayor (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Mayor updated SOLR-2341:


Attachment: SOLR-2341.patch

This patch makes the implemented policy deterministic. This is missing from the 
previous patch. The policy code has also been refactored into it's own package.

> Shard distribution policy
> -
>
> Key: SOLR-2341
> URL: https://issues.apache.org/jira/browse/SOLR-2341
> Project: Solr
>  Issue Type: New Feature
>Reporter: William Mayor
>Priority: Minor
> Attachments: SOLR-2341.patch, SOLR-2341.patch
>
>
> A first crack at creating policies to be used for determining to which of a 
> list of shards a document should go. See discussion on "Distributed Indexing" 
> on dev-list.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1540) Improvements to contrib.benchmark for TREC collections

2011-02-06 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991224#comment-12991224
 ] 

Doron Cohen commented on LUCENE-1540:
-

Following suggestions by Robert, brought back case insensitivity of path names 
by upper casing with Locale.ENGLISH as suggested in 
[toUpperCase()|http://download.oracle.com/javase/6/docs/api/java/lang/String.html#toUpperCase%28%29].
 
Committed:
- r1067764 - 3x
- r1067772 - trunk

> Improvements to contrib.benchmark for TREC collections
> --
>
> Key: LUCENE-1540
> URL: https://issues.apache.org/jira/browse/LUCENE-1540
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Tim Armstrong
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-1540.patch, LUCENE-1540.patch, LUCENE-1540.patch, 
> LUCENE-1540.patch, trecdocs.zip
>
>
> The benchmarking utilities for  TREC test collections (http://trec.nist.gov) 
> are quite limited and do not support some of the variations in format of 
> older TREC collections.  
> I have been doing some benchmarking work with Lucene and have had to modify 
> the package to support:
> * Older TREC document formats, which the current parser fails on due to 
> missing document headers.
> * Variations in query format - newlines after  tag causing the query 
> parser to get confused.
> * Ability to detect and read in uncompressed text collections
> * Storage of document numbers by default without storing full text.
> I can submit a patch if there is interest, although I will probably want to 
> write unit tests for the new functionality first.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2903) Improvement of PForDelta Codec

2011-02-06 Thread hao yan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991221#comment-12991221
 ] 

hao yan commented on LUCENE-2903:
-

Hi, Paul

I tested ByteBuffer->IntBuffer, it is not faster than converting int[] <-> 
byte[]. 

> Improvement of PForDelta Codec
> --
>
> Key: LUCENE-2903
> URL: https://issues.apache.org/jira/browse/LUCENE-2903
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: hao yan
> Attachments: LUCENE_2903.patch, LUCENE_2903.patch
>
>
> There are 3 versions of PForDelta implementations in the Bulk Branch: 
> FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
> The FrameOfRef is a very basic one which is essentially a binary encoding 
> (may result in huge index size).
> The PatchedFrameOfRef is the implmentation based on the original version of 
> PForDelta in the literatures.
> The PatchedFrameOfRef2 is my previous implementation which are improved this 
> time. (The Codec name is changed to NewPForDelta.).
> In particular, the changes are:
> 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the 
> old PForDelta does not support very large exceptions (since
> the Simple16 does not support very large numbers). Now this has been fixed in 
> the new LCPForDelta.
> 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other 
> two PForDelta implementation in the bulk branch (FrameOfRef and 
> PatchedFrameOfRef). The codec's name is "NewPForDelta", as you can see in the 
> CodecProvider and PForDeltaFixedIntBlockCodec.
> 3. The performance test results are:
> 1) My "NewPForDelta" codec is faster then FrameOfRef and PatchedFrameOfRef 
> for almost all kinds of queries, slightly worse then BulkVInt.
> 2) My "NewPForDelta" codec can result in the smallest index size among all 4 
> methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
> 3) All performance test results are achieved by running with "-server" 
> instead of "-client"

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2903) Improvement of PForDelta Codec

2011-02-06 Thread hao yan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991222#comment-12991222
 ] 

hao yan commented on LUCENE-2903:
-

And it sure complicate the pfordelta algorithm a lot by using intbuffer.set/get.

> Improvement of PForDelta Codec
> --
>
> Key: LUCENE-2903
> URL: https://issues.apache.org/jira/browse/LUCENE-2903
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: hao yan
> Attachments: LUCENE_2903.patch, LUCENE_2903.patch
>
>
> There are 3 versions of PForDelta implementations in the Bulk Branch: 
> FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
> The FrameOfRef is a very basic one which is essentially a binary encoding 
> (may result in huge index size).
> The PatchedFrameOfRef is the implmentation based on the original version of 
> PForDelta in the literatures.
> The PatchedFrameOfRef2 is my previous implementation which are improved this 
> time. (The Codec name is changed to NewPForDelta.).
> In particular, the changes are:
> 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the 
> old PForDelta does not support very large exceptions (since
> the Simple16 does not support very large numbers). Now this has been fixed in 
> the new LCPForDelta.
> 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other 
> two PForDelta implementation in the bulk branch (FrameOfRef and 
> PatchedFrameOfRef). The codec's name is "NewPForDelta", as you can see in the 
> CodecProvider and PForDeltaFixedIntBlockCodec.
> 3. The performance test results are:
> 1) My "NewPForDelta" codec is faster then FrameOfRef and PatchedFrameOfRef 
> for almost all kinds of queries, slightly worse then BulkVInt.
> 2) My "NewPForDelta" codec can result in the smallest index size among all 4 
> methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
> 3) All performance test results are achieved by running with "-server" 
> instead of "-client"

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2903) Improvement of PForDelta Codec

2011-02-06 Thread hao yan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991220#comment-12991220
 ] 

hao yan commented on LUCENE-2903:
-

HI, Michael

Did u try FrameOfRef and PatchedFrameOfRef? 

> Improvement of PForDelta Codec
> --
>
> Key: LUCENE-2903
> URL: https://issues.apache.org/jira/browse/LUCENE-2903
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: hao yan
> Attachments: LUCENE_2903.patch, LUCENE_2903.patch
>
>
> There are 3 versions of PForDelta implementations in the Bulk Branch: 
> FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
> The FrameOfRef is a very basic one which is essentially a binary encoding 
> (may result in huge index size).
> The PatchedFrameOfRef is the implmentation based on the original version of 
> PForDelta in the literatures.
> The PatchedFrameOfRef2 is my previous implementation which are improved this 
> time. (The Codec name is changed to NewPForDelta.).
> In particular, the changes are:
> 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the 
> old PForDelta does not support very large exceptions (since
> the Simple16 does not support very large numbers). Now this has been fixed in 
> the new LCPForDelta.
> 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other 
> two PForDelta implementation in the bulk branch (FrameOfRef and 
> PatchedFrameOfRef). The codec's name is "NewPForDelta", as you can see in the 
> CodecProvider and PForDeltaFixedIntBlockCodec.
> 3. The performance test results are:
> 1) My "NewPForDelta" codec is faster then FrameOfRef and PatchedFrameOfRef 
> for almost all kinds of queries, slightly worse then BulkVInt.
> 2) My "NewPForDelta" codec can result in the smallest index size among all 4 
> methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
> 3) All performance test results are achieved by running with "-server" 
> instead of "-client"

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1067699 - in /lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src: java/org/apache/lucene/benchmark/byTask/feeds/TrecDocParser.java test/org/apache/lucene/benchmark/byTask/feed

2011-02-06 Thread Doron Cohen
Interesting... Thanks Robert for pointing this out!

> "To obtain correct results for locale insensitive strings, use
toUpperCase(Locale.ENGLISH)"

Actually this is one of the things I tried and did solve it - with
toUpperCase(Locale.US) - not exactly Locale.ENGLISH but quite similar I
assume -  and as you suggest it felt wrong, for wrong reasons...

Perhaps I'll change it like this, case insensitivity is a good think when
running in various OS's.

On Sun, Feb 6, 2011 at 6:55 PM, Robert Muir  wrote:

> Thanks for catching this Doron. Another option if you want to keep the
> case-insensitive feature here would be to use
> toUpperCase(Locale.ENGLISH)
>
> It might look bad, but its actually recommended by the JDK for
> locale-insensitive strings:
>
> http://download.oracle.com/javase/6/docs/api/java/lang/String.html#toUpperCase()
>
> On Sun, Feb 6, 2011 at 11:43 AM,   wrote:
> > Author: doronc
> > Date: Sun Feb  6 16:43:54 2011
> > New Revision: 1067699
> >
> > URL: http://svn.apache.org/viewvc?rev=1067699&view=rev
> > Log:
> > LUCENE-1540: Improvements to contrib.benchmark for TREC collections - fix
> test failures in some locales due to toUpperCase()
> >
> > Modified:
> >
>  
> lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecDocParser.java
> >
>  
> lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/test/org/apache/lucene/benchmark/byTask/feeds/trecdocs.zip
> >
> > Modified:
> lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecDocParser.java
> > URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecDocParser.java?rev=1067699&r1=1067698&r2=1067699&view=diff
> >
> ==
> > ---
> lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecDocParser.java
> (original)
> > +++
> lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecDocParser.java
> Sun Feb  6 16:43:54 2011
> > @@ -29,7 +29,12 @@ import java.util.Map;
> >  public abstract class TrecDocParser {
> >
> >   /** Types of trec parse paths, */
> > -  public enum ParsePathType { GOV2, FBIS, FT, FR94, LATIMES }
> > +  public enum ParsePathType { GOV2("gov2"), FBIS("fbis"), FT("ft"),
> FR94("fr94"), LATIMES("latimes");
> > +public final String dirName;
> > +private ParsePathType(String dirName) {
> > +  this.dirName = dirName;
> > +}
> > +  }
> >
> >   /** trec parser type used for unknown extensions */
> >   public static final ParsePathType DEFAULT_PATH_TYPE  =
> ParsePathType.GOV2;
> > @@ -46,7 +51,7 @@ public abstract class TrecDocParser {
> >   static final Map pathName2Type = new
> HashMap();
> >   static {
> > for (ParsePathType ppt : ParsePathType.values()) {
> > -  pathName2Type.put(ppt.name(),ppt);
> > +  pathName2Type.put(ppt.dirName,ppt);
> > }
> >   }
> >
> > @@ -59,7 +64,7 @@ public abstract class TrecDocParser {
> >   public static ParsePathType pathType(File f) {
> > int pathLength = 0;
> > while (f != null && ++pathLength < MAX_PATH_LENGTH) {
> > -  ParsePathType ppt = pathName2Type.get(f.getName().toUpperCase());
> > +  ParsePathType ppt = pathName2Type.get(f.getName());
> >   if (ppt!=null) {
> > return ppt;
> >   }
> >
> > Modified:
> lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/test/org/apache/lucene/benchmark/byTask/feeds/trecdocs.zip
> > URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/test/org/apache/lucene/benchmark/byTask/feeds/trecdocs.zip?rev=1067699&r1=1067698&r2=1067699&view=diff
> >
> ==
> > Binary files - no diff available.
> >
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] Commented: (LUCENE-1540) Improvements to contrib.benchmark for TREC collections

2011-02-06 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991210#comment-12991210
 ] 

Doron Cohen commented on LUCENE-1540:
-

bq. I wish we knew of a good solution, because I hate it when things aren't 
completely reproducible everywhere.

Thanks Robert, I am actually very pleased with this array of testing with 
various parameters like locale and others randomly selected - it is very 
powreful, and since the failure printed all the parameters used and even the 
ant line to reproduce(\!)  - it was possible to reproduce in 3x, and, once 
understanding what the problem was also possible to reproduce in trunk - to me 
this is testing's heaven...

> Improvements to contrib.benchmark for TREC collections
> --
>
> Key: LUCENE-1540
> URL: https://issues.apache.org/jira/browse/LUCENE-1540
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Tim Armstrong
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-1540.patch, LUCENE-1540.patch, LUCENE-1540.patch, 
> LUCENE-1540.patch, trecdocs.zip
>
>
> The benchmarking utilities for  TREC test collections (http://trec.nist.gov) 
> are quite limited and do not support some of the variations in format of 
> older TREC collections.  
> I have been doing some benchmarking work with Lucene and have had to modify 
> the package to support:
> * Older TREC document formats, which the current parser fails on due to 
> missing document headers.
> * Variations in query format - newlines after  tag causing the query 
> parser to get confused.
> * Ability to detect and read in uncompressed text collections
> * Storage of document numbers by default without storing full text.
> I can submit a patch if there is interest, although I will probably want to 
> write unit tests for the new functionality first.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2609) Generate jar containing test classes.

2011-02-06 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2609.


Resolution: Fixed

Committed revision 1067738.

Thanks all for your comments and help !

> Generate jar containing test classes.
> -
>
> Key: LUCENE-2609
> URL: https://issues.apache.org/jira/browse/LUCENE-2609
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.2
>Reporter: Drew Farris
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2609.patch, LUCENE-2609.patch, LUCENE-2609.patch, 
> LUCENE-2609.patch, LUCENE-2609.patch, LUCENE-2609.patch, LUCENE-2609.patch, 
> LUCENE-2609.patch, LUCENE-2609.patch, LUCENE-2609.patch
>
>
> The test classes are useful for writing unit tests for code external to the 
> Lucene project. It would be helpful to build a jar of these classes and 
> publish them as a maven dependency.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2907) automaton termsenum bug when running with multithreaded search

2011-02-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991198#comment-12991198
 ] 

Uwe Schindler commented on LUCENE-2907:
---

Thanks, really nice now :-)

> automaton termsenum bug when running with multithreaded search
> --
>
> Key: LUCENE-2907
> URL: https://issues.apache.org/jira/browse/LUCENE-2907
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Robert Muir
> Attachments: LUCENE-2907.patch, LUCENE-2907.patch, 
> LUCENE-2907_repro.patch, correct_seeks.txt, incorrect_seeks.txt, 
> seeks_diff.txt
>
>
> This one popped in hudson (with a test that runs the same query against 
> fieldcache, and with a filter rewrite, and compares results)
> However, its actually worse and unrelated to the fieldcache: you can set both 
> to filter rewrite and it will still fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2907) automaton termsenum bug when running with multithreaded search

2011-02-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2907.
-

Resolution: Fixed
  Assignee: Robert Muir

Committed revision 1067720.

> automaton termsenum bug when running with multithreaded search
> --
>
> Key: LUCENE-2907
> URL: https://issues.apache.org/jira/browse/LUCENE-2907
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Robert Muir
> Attachments: LUCENE-2907.patch, LUCENE-2907.patch, 
> LUCENE-2907_repro.patch, correct_seeks.txt, incorrect_seeks.txt, 
> seeks_diff.txt
>
>
> This one popped in hudson (with a test that runs the same query against 
> fieldcache, and with a filter rewrite, and compares results)
> However, its actually worse and unrelated to the fieldcache: you can set both 
> to filter rewrite and it will still fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2256) CommonsHttpSolrServer.deleteById(emptyList) causes SolrException: missing_content_stream

2011-02-06 Thread Stevo Slavic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991191#comment-12991191
 ] 

Stevo Slavic commented on SOLR-2256:


I've experienced similar behavior with SolrJ 1.4.1 - later discovered that 
actual problem was that index schema was outdated, it was missing a field which 
was present in document.

> CommonsHttpSolrServer.deleteById(emptyList) causes SolrException: 
> missing_content_stream
> 
>
> Key: SOLR-2256
> URL: https://issues.apache.org/jira/browse/SOLR-2256
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 1.4.1
>Reporter: Maxim Valyanskiy
>Priority: Minor
>
> Call to deleteById method of CommonsHttpSolrServer with empty list causes 
> following exception:
> org.apache.solr.common.SolrException: missing_content_stream
> missing_content_stream
> request: http://127.0.0.1:8983/solr/update/javabin
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> at org.apache.solr.client.solrj.SolrServer.deleteById(SolrServer.java:106)
> at 
> ru.org.linux.spring.SearchQueueListener.reindexMessage(SearchQueueListener.java:89)
> Here is TCP stream captured by Wireshark:
> =
> POST /solr/update HTTP/1.1
> Content-Type: application/x-www-form-urlencoded; charset=UTF-8
> User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0
> Host: 127.0.0.1:8983
> Content-Length: 20
> wt=javabin&version=1
> =
> HTTP/1.1 400 missing_content_stream
> Content-Type: text/html; charset=iso-8859-1
> Content-Length: 1401
> Server: Jetty(6.1.3)
> = [ html reply skipped ] ===

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2908) clean up serialization in the codebase

2011-02-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991187#comment-12991187
 ] 

Uwe Schindler commented on LUCENE-2908:
---

+1

> clean up serialization in the codebase
> --
>
> Key: LUCENE-2908
> URL: https://issues.apache.org/jira/browse/LUCENE-2908
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2908.patch
>
>
> We removed contrib/remote, but forgot to cleanup serialization hell 
> everywhere.
> this is no longer needed, never really worked (e.g. across versions), and 
> slows 
> development (e.g. i wasted a long time debugging stupid serialization of 
> Similarity.idfExplain when trying to make a patch for the scoring system).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2906) Filter to process output of ICUTokenizer and create overlapping bigrams for CJK

2011-02-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991186#comment-12991186
 ] 

Robert Muir commented on LUCENE-2906:
-

{quote}
How will this differ from the SmartChineseAnalyzer?
{quote}

The SmartChineseAnalyzer is for Simplified Chinese only... this is about the 
language-independent technique similar to what CJKAnalyzer does today.

{quote}
I doubt it but can this be in 3.1?
{quote}

Well i hate the way CJKAnalyzer treats things like supplementary characters 
(wrongly).
This is definitely a bug, and fixed here. Part of me wants to fix this as 
quickly as possible.

At the same time though, I would prefer 3.2... otherwise I would feel like I am 
rushing things.

I don't think 3.2 needs to come a year after 3.1... in fact since we have a 
stable branch I think its
stupid to make bugfix releases like 3.1.1 when we could just push out a new 
minor version (3.2) with
bugfixes instead. The whole branch is intended to be stable changes, so I think 
this is better use
of our time. But this is just my opinion, we can discuss it later on the list 
as one idea to promote 
more rapid releases.


> Filter to process output of ICUTokenizer and create overlapping bigrams for 
> CJK 
> 
>
> Key: LUCENE-2906
> URL: https://issues.apache.org/jira/browse/LUCENE-2906
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Tom Burton-West
>Priority: Minor
> Attachments: LUCENE-2906.patch
>
>
> The ICUTokenizer produces unigrams for CJK. We would like to use the 
> ICUTokenizer but have overlapping bigrams created for CJK as in the CJK 
> Analyzer.  This filter would take the output of the ICUtokenizer, read the 
> ScriptAttribute and for selected scripts (Han, Kana), would produce 
> overlapping bigrams.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1540) Improvements to contrib.benchmark for TREC collections

2011-02-06 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991181#comment-12991181
 ] 

Doron Cohen commented on LUCENE-1540:
-

Fix for the locale issue merged to trunk at r1076605.
Keeping open for a day or so to make sure there are no more failures.

> Improvements to contrib.benchmark for TREC collections
> --
>
> Key: LUCENE-1540
> URL: https://issues.apache.org/jira/browse/LUCENE-1540
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Tim Armstrong
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-1540.patch, LUCENE-1540.patch, LUCENE-1540.patch, 
> LUCENE-1540.patch, trecdocs.zip
>
>
> The benchmarking utilities for  TREC test collections (http://trec.nist.gov) 
> are quite limited and do not support some of the variations in format of 
> older TREC collections.  
> I have been doing some benchmarking work with Lucene and have had to modify 
> the package to support:
> * Older TREC document formats, which the current parser fails on due to 
> missing document headers.
> * Variations in query format - newlines after  tag causing the query 
> parser to get confused.
> * Ability to detect and read in uncompressed text collections
> * Storage of document numbers by default without storing full text.
> I can submit a patch if there is interest, although I will probably want to 
> write unit tests for the new functionality first.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2907) automaton termsenum bug when running with multithreaded search

2011-02-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991180#comment-12991180
 ] 

Robert Muir commented on LUCENE-2907:
-

bq. I am not sure if CompiledAutomation is a good name since it is not really 
an automaton is it?

it is a compiled form of the automaton... and it is a dfa, mathematically.

At the end of the day this CompiledAutomaton is an internal api, we can change 
its name at any time.


> automaton termsenum bug when running with multithreaded search
> --
>
> Key: LUCENE-2907
> URL: https://issues.apache.org/jira/browse/LUCENE-2907
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-2907.patch, LUCENE-2907.patch, 
> LUCENE-2907_repro.patch, correct_seeks.txt, incorrect_seeks.txt, 
> seeks_diff.txt
>
>
> This one popped in hudson (with a test that runs the same query against 
> fieldcache, and with a filter rewrite, and compares results)
> However, its actually worse and unrelated to the fieldcache: you can set both 
> to filter rewrite and it will still fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1540) Improvements to contrib.benchmark for TREC collections

2011-02-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991179#comment-12991179
 ] 

Robert Muir commented on LUCENE-1540:
-

Hi Doron, about the test random seeds:

It is complicated (though maybe we could fix this!) for the same random seed in 
trunk to work just like 3.x

But for the locales: the way it picks a random locale is from the available 
system locales. This changes from jre to jre,
so unfortunately we cannot guarantee that the same seed chooses the same locale 
randomly... Its the same with 
timezones too... and these even change in minor jdk updates! 

I wish we knew of a good solution, because I hate it when things aren't 
completely reproducible everywhere.


> Improvements to contrib.benchmark for TREC collections
> --
>
> Key: LUCENE-1540
> URL: https://issues.apache.org/jira/browse/LUCENE-1540
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Tim Armstrong
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-1540.patch, LUCENE-1540.patch, LUCENE-1540.patch, 
> LUCENE-1540.patch, trecdocs.zip
>
>
> The benchmarking utilities for  TREC test collections (http://trec.nist.gov) 
> are quite limited and do not support some of the variations in format of 
> older TREC collections.  
> I have been doing some benchmarking work with Lucene and have had to modify 
> the package to support:
> * Older TREC document formats, which the current parser fails on due to 
> missing document headers.
> * Variations in query format - newlines after  tag causing the query 
> parser to get confused.
> * Ability to detect and read in uncompressed text collections
> * Storage of document numbers by default without storing full text.
> I can submit a patch if there is interest, although I will probably want to 
> write unit tests for the new functionality first.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2907) automaton termsenum bug when running with multithreaded search

2011-02-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991178#comment-12991178
 ] 

Simon Willnauer commented on LUCENE-2907:
-

patch looks good - just being super picky: you don't need all the this.bla in 
CompiledAutomaton ;)

I am not sure if CompiledAutomation is a good name since it is not really an 
automaton is it?

simon

> automaton termsenum bug when running with multithreaded search
> --
>
> Key: LUCENE-2907
> URL: https://issues.apache.org/jira/browse/LUCENE-2907
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-2907.patch, LUCENE-2907.patch, 
> LUCENE-2907_repro.patch, correct_seeks.txt, incorrect_seeks.txt, 
> seeks_diff.txt
>
>
> This one popped in hudson (with a test that runs the same query against 
> fieldcache, and with a filter rewrite, and compares results)
> However, its actually worse and unrelated to the fieldcache: you can set both 
> to filter rewrite and it will still fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2908) clean up serialization in the codebase

2011-02-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991177#comment-12991177
 ] 

Simon Willnauer commented on LUCENE-2908:
-

big +1 to get rid of Serializable its broken anyway, slow and not really 
working across versions! Folks that want to send stuff through the wire using 
java serialization should put api sugar on top.



> clean up serialization in the codebase
> --
>
> Key: LUCENE-2908
> URL: https://issues.apache.org/jira/browse/LUCENE-2908
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2908.patch
>
>
> We removed contrib/remote, but forgot to cleanup serialization hell 
> everywhere.
> this is no longer needed, never really worked (e.g. across versions), and 
> slows 
> development (e.g. i wasted a long time debugging stupid serialization of 
> Similarity.idfExplain when trying to make a patch for the scoring system).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1540) Improvements to contrib.benchmark for TREC collections

2011-02-06 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991176#comment-12991176
 ] 

Doron Cohen commented on LUCENE-1540:
-

I am able to reproduce this on Linux.
The test fails with *locale tr_TR* because TrecDocParser was upper-casing the 
file names for deciding which parser to apply.
Problem with this is that toUpperCase is locale sensitive, and so the file name 
no longer matched the enum name.
Fixed by adding a lower case dirName member to the enums.
Also recreated the test files zip with '-UN u' for UTF8 handling of file names 
in the zip.

committed at r1067699 for 3x.

In trunk the test passes with same args also in Linux, but fails if you pass 
the locale that was randomly selected in 3x, i.e. like this: 
ant test -Dtestcase=TrecContentSourceTest -Dtestmethod=testTrecFeedDirAllTypes 
-Dtests.locale=tr_TR

Will merge the fix to trunk shortly.

> Improvements to contrib.benchmark for TREC collections
> --
>
> Key: LUCENE-1540
> URL: https://issues.apache.org/jira/browse/LUCENE-1540
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Tim Armstrong
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-1540.patch, LUCENE-1540.patch, LUCENE-1540.patch, 
> LUCENE-1540.patch, trecdocs.zip
>
>
> The benchmarking utilities for  TREC test collections (http://trec.nist.gov) 
> are quite limited and do not support some of the variations in format of 
> older TREC collections.  
> I have been doing some benchmarking work with Lucene and have had to modify 
> the package to support:
> * Older TREC document formats, which the current parser fails on due to 
> missing document headers.
> * Variations in query format - newlines after  tag causing the query 
> parser to get confused.
> * Ability to detect and read in uncompressed text collections
> * Storage of document numbers by default without storing full text.
> I can submit a patch if there is interest, although I will probably want to 
> write unit tests for the new functionality first.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2907) automaton termsenum bug when running with multithreaded search

2011-02-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2907:


Attachment: LUCENE-2907.patch

here's the same patch, but cleaned up a bit (e.g. making some things private, 
final, etc)

> automaton termsenum bug when running with multithreaded search
> --
>
> Key: LUCENE-2907
> URL: https://issues.apache.org/jira/browse/LUCENE-2907
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-2907.patch, LUCENE-2907.patch, 
> LUCENE-2907_repro.patch, correct_seeks.txt, incorrect_seeks.txt, 
> seeks_diff.txt
>
>
> This one popped in hudson (with a test that runs the same query against 
> fieldcache, and with a filter rewrite, and compares results)
> However, its actually worse and unrelated to the fieldcache: you can set both 
> to filter rewrite and it will still fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2894) Use of google-code-prettify for Lucene/Solr Javadoc

2011-02-06 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991173#comment-12991173
 ] 

Steven Rowe commented on LUCENE-2894:
-

Both of the nightly Hudson Maven builds failed because javadoc jars were not 
produced by the Ant build (scroll down to the bottom to see the error about 
javadoc jars not being available to deploy): 

https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-trunk/17/consoleText
https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-3.x/16/consoleText

> Use of google-code-prettify for Lucene/Solr Javadoc
> ---
>
> Key: LUCENE-2894
> URL: https://issues.apache.org/jira/browse/LUCENE-2894
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Javadocs
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2894.patch, LUCENE-2894.patch, LUCENE-2894.patch, 
> LUCENE-2894.patch
>
>
> My company, RONDHUIT uses google-code-prettify (Apache License 2.0) in 
> Javadoc for syntax highlighting:
> http://www.rondhuit-demo.com/RCSS/api/com/rondhuit/solr/analysis/JaReadingSynonymFilterFactory.html
> I think we can use it for Lucene javadoc (java sample code in overview.html 
> etc) and Solr javadoc (Analyzer Factories etc) to improve or simplify our 
> life.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1067699 - in /lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src: java/org/apache/lucene/benchmark/byTask/feeds/TrecDocParser.java test/org/apache/lucene/benchmark/byTask/feed

2011-02-06 Thread Robert Muir
Thanks for catching this Doron. Another option if you want to keep the
case-insensitive feature here would be to use
toUpperCase(Locale.ENGLISH)

It might look bad, but its actually recommended by the JDK for
locale-insensitive strings:
http://download.oracle.com/javase/6/docs/api/java/lang/String.html#toUpperCase()

On Sun, Feb 6, 2011 at 11:43 AM,   wrote:
> Author: doronc
> Date: Sun Feb  6 16:43:54 2011
> New Revision: 1067699
>
> URL: http://svn.apache.org/viewvc?rev=1067699&view=rev
> Log:
> LUCENE-1540: Improvements to contrib.benchmark for TREC collections - fix 
> test failures in some locales due to toUpperCase()
>
> Modified:
>    
> lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecDocParser.java
>    
> lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/test/org/apache/lucene/benchmark/byTask/feeds/trecdocs.zip
>
> Modified: 
> lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecDocParser.java
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecDocParser.java?rev=1067699&r1=1067698&r2=1067699&view=diff
> ==
> --- 
> lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecDocParser.java
>  (original)
> +++ 
> lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecDocParser.java
>  Sun Feb  6 16:43:54 2011
> @@ -29,7 +29,12 @@ import java.util.Map;
>  public abstract class TrecDocParser {
>
>   /** Types of trec parse paths, */
> -  public enum ParsePathType { GOV2, FBIS, FT, FR94, LATIMES }
> +  public enum ParsePathType { GOV2("gov2"), FBIS("fbis"), FT("ft"), 
> FR94("fr94"), LATIMES("latimes");
> +    public final String dirName;
> +    private ParsePathType(String dirName) {
> +      this.dirName = dirName;
> +    }
> +  }
>
>   /** trec parser type used for unknown extensions */
>   public static final ParsePathType DEFAULT_PATH_TYPE  = ParsePathType.GOV2;
> @@ -46,7 +51,7 @@ public abstract class TrecDocParser {
>   static final Map pathName2Type = new 
> HashMap();
>   static {
>     for (ParsePathType ppt : ParsePathType.values()) {
> -      pathName2Type.put(ppt.name(),ppt);
> +      pathName2Type.put(ppt.dirName,ppt);
>     }
>   }
>
> @@ -59,7 +64,7 @@ public abstract class TrecDocParser {
>   public static ParsePathType pathType(File f) {
>     int pathLength = 0;
>     while (f != null && ++pathLength < MAX_PATH_LENGTH) {
> -      ParsePathType ppt = pathName2Type.get(f.getName().toUpperCase());
> +      ParsePathType ppt = pathName2Type.get(f.getName());
>       if (ppt!=null) {
>         return ppt;
>       }
>
> Modified: 
> lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/test/org/apache/lucene/benchmark/byTask/feeds/trecdocs.zip
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/benchmark/src/test/org/apache/lucene/benchmark/byTask/feeds/trecdocs.zip?rev=1067699&r1=1067698&r2=1067699&view=diff
> ==
> Binary files - no diff available.
>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1799) Unicode compression

2011-02-06 Thread DM Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991170#comment-12991170
 ] 

DM Smith commented on LUCENE-1799:
--

Any idea as to when this will be released?

> Unicode compression
> ---
>
> Key: LUCENE-1799
> URL: https://issues.apache.org/jira/browse/LUCENE-1799
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Store
>Affects Versions: 2.4.1
>Reporter: DM Smith
>Priority: Minor
> Attachments: Benchmark.java, Benchmark.java, Benchmark.java, 
> LUCENE-1779.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, 
> LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, 
> LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, 
> LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch
>
>
> In lucene-1793, there is the off-topic suggestion to provide compression of 
> Unicode data. The motivation was a custom encoding in a Russian analyzer. The 
> original supposition was that it provided a more compact index.
> This led to the comment that a different or compressed encoding would be a 
> generally useful feature. 
> BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM 
> with an implementation in ICU. If Lucene provide it's own implementation a 
> freely avIlable, royalty-free license would need to be obtained.
> SCSU is another Unicode compression algorithm that could be used. 
> An advantage of these methods is that they work on the whole of Unicode. If 
> that is not needed an encoding such as iso8859-1 (or whatever covers the 
> input) could be used.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2906) Filter to process output of ICUTokenizer and create overlapping bigrams for CJK

2011-02-06 Thread DM Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991169#comment-12991169
 ] 

DM Smith commented on LUCENE-2906:
--

Two questions:
How will this differ from the SmartChineseAnalyzer?
I doubt it but can this be in 3.1?

> Filter to process output of ICUTokenizer and create overlapping bigrams for 
> CJK 
> 
>
> Key: LUCENE-2906
> URL: https://issues.apache.org/jira/browse/LUCENE-2906
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Tom Burton-West
>Priority: Minor
> Attachments: LUCENE-2906.patch
>
>
> The ICUTokenizer produces unigrams for CJK. We would like to use the 
> ICUTokenizer but have overlapping bigrams created for CJK as in the CJK 
> Analyzer.  This filter would take the output of the ICUtokenizer, read the 
> ScriptAttribute and for selected scripts (Han, Kana), would produce 
> overlapping bigrams.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2908) clean up serialization in the codebase

2011-02-06 Thread Robert Muir (JIRA)
clean up serialization in the codebase
--

 Key: LUCENE-2908
 URL: https://issues.apache.org/jira/browse/LUCENE-2908
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 4.0
 Attachments: LUCENE-2908.patch

We removed contrib/remote, but forgot to cleanup serialization hell everywhere.

this is no longer needed, never really worked (e.g. across versions), and slows 
development (e.g. i wasted a long time debugging stupid serialization of 
Similarity.idfExplain when trying to make a patch for the scoring system).


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2908) clean up serialization in the codebase

2011-02-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2908:


Attachment: LUCENE-2908.patch

attached is a patch. all tests pass.

> clean up serialization in the codebase
> --
>
> Key: LUCENE-2908
> URL: https://issues.apache.org/jira/browse/LUCENE-2908
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2908.patch
>
>
> We removed contrib/remote, but forgot to cleanup serialization hell 
> everywhere.
> this is no longer needed, never really worked (e.g. across versions), and 
> slows 
> development (e.g. i wasted a long time debugging stupid serialization of 
> Similarity.idfExplain when trying to make a patch for the scoring system).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2894) Use of google-code-prettify for Lucene/Solr Javadoc

2011-02-06 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991165#comment-12991165
 ] 

Koji Sekiguchi commented on LUCENE-2894:


On my mac, there is prettify correctly under api directory after ant package:

{code}
$ cd solr
$ ant clean set-fsdir package
$ ls build/docs/api/
allclasses-frame.html  deprecated-list.html   package-list
allclasses-noframe.htmlhelp-doc.html  prettify
constant-values.html   index-all.html resources
contrib-solr-analysis-extras   index.html 
serialized-form.html
contrib-solr-cell  orgsolr
contrib-solr-clusteringoverview-frame.htmlsolrj
contrib-solr-dataimporthandler overview-summary.html  
stylesheet+prettify.css
contrib-solr-uima  overview-tree.html
{code}


> Use of google-code-prettify for Lucene/Solr Javadoc
> ---
>
> Key: LUCENE-2894
> URL: https://issues.apache.org/jira/browse/LUCENE-2894
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Javadocs
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2894.patch, LUCENE-2894.patch, LUCENE-2894.patch, 
> LUCENE-2894.patch
>
>
> My company, RONDHUIT uses google-code-prettify (Apache License 2.0) in 
> Javadoc for syntax highlighting:
> http://www.rondhuit-demo.com/RCSS/api/com/rondhuit/solr/analysis/JaReadingSynonymFilterFactory.html
> I think we can use it for Lucene javadoc (java sample code in overview.html 
> etc) and Solr javadoc (Analyzer Factories etc) to improve or simplify our 
> life.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-2894) Use of google-code-prettify for Lucene/Solr Javadoc

2011-02-06 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reopened LUCENE-2894:



Reopening the issue.

Lucene javadoc on hudson looks fine (syntax highlighting works correctly):

https://hudson.apache.org/hudson/job/Lucene-trunk/javadoc/all/overview-summary.html

but Solr javadoc on hudson looks not good:

https://hudson.apache.org/hudson/job/Solr-trunk/javadoc/org/apache/solr/handler/component/TermsComponent.html

Building of both javadocs on my local is working fine.

> Use of google-code-prettify for Lucene/Solr Javadoc
> ---
>
> Key: LUCENE-2894
> URL: https://issues.apache.org/jira/browse/LUCENE-2894
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Javadocs
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2894.patch, LUCENE-2894.patch, LUCENE-2894.patch, 
> LUCENE-2894.patch
>
>
> My company, RONDHUIT uses google-code-prettify (Apache License 2.0) in 
> Javadoc for syntax highlighting:
> http://www.rondhuit-demo.com/RCSS/api/com/rondhuit/solr/analysis/JaReadingSynonymFilterFactory.html
> I think we can use it for Lucene javadoc (java sample code in overview.html 
> etc) and Solr javadoc (Analyzer Factories etc) to improve or simplify our 
> life.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON-MAVEN] Lucene-Solr-Maven-trunk #17: POMs out of sync

2011-02-06 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-trunk/17/

No tests ran.

Build Log (for compile errors):
[...truncated 7757 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-3.x - Build # 4561 - Failure

2011-02-06 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/4561/

1 tests failed.
REGRESSION:  org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability

Error Message:
No live SolrServers available to handle this request

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available 
to handle this request
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:222)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at 
org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability(TestLBHttpSolrServer.java:177)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1045)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:977)
Caused by: org.apache.solr.client.solrj.SolrServerException: 
java.net.SocketTimeoutException: Read timed out
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:484)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:206)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:428)




Build Log (for compile errors):
[...truncated 10090 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2906) Filter to process output of ICUTokenizer and create overlapping bigrams for CJK

2011-02-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2906:


Attachment: LUCENE-2906.patch

here's a patch going in a slightly different direction (though we can still add 
some special icu-only stuff here).

instead the patch synchronizes the token types of ICUTokenizer with 
StandardTokenizer, adds the necessarily types to both, and then adds the 
bigramming logic to standardfilter.

this way, cjk works easily "out of box", for all of unicode (e.g. 
supplementaries) and plays well with other languages. i deprecated cjktokenizer 
in the patch and pulled out its special full-width filter into a separate 
tokenfilter.


> Filter to process output of ICUTokenizer and create overlapping bigrams for 
> CJK 
> 
>
> Key: LUCENE-2906
> URL: https://issues.apache.org/jira/browse/LUCENE-2906
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Tom Burton-West
>Priority: Minor
> Attachments: LUCENE-2906.patch
>
>
> The ICUTokenizer produces unigrams for CJK. We would like to use the 
> ICUTokenizer but have overlapping bigrams created for CJK as in the CJK 
> Analyzer.  This filter would take the output of the ICUtokenizer, read the 
> ScriptAttribute and for selected scripts (Han, Kana), would produce 
> overlapping bigrams.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [HUDSON] Lucene-Solr-tests-only-3.x - Build # 4555 - Failure

2011-02-06 Thread Doron Cohen
checking...

On Sun, Feb 6, 2011 at 2:19 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> I think this is happening because of LUCENE-1540...
>
> Mike
>
> On Sun, Feb 6, 2011 at 5:25 AM, Apache Hudson Server
>  wrote:
> > Build:
> https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/4555/
> >
> > 1 tests failed.
> > REGRESSION:
>  
> org.apache.lucene.benchmark.byTask.feeds.TrecContentSourceTest.testTrecFeedDirAllTypes
> >
> > Error Message:
> > expected: but was:
> >
> > Stack Trace:
> >at
> org.apache.lucene.benchmark.byTask.feeds.TrecContentSourceTest.assertDocData(TrecContentSourceTest.java:70)
> >at
> org.apache.lucene.benchmark.byTask.feeds.TrecContentSourceTest.testTrecFeedDirAllTypes(TrecContentSourceTest.java:369)
> >at
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1045)
> >at
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:977)
> >
> >
> >
> >
> > Build Log (for compile errors):
> > [...truncated 6504 lines...]
> >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] Updated: (LUCENE-2907) automaton termsenum bug when running with multithreaded search

2011-02-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2907:


Attachment: LUCENE-2907.patch

attached is a patch. I removed all the transient/synchronized stuff from the 
query.

Instead: AutomatonTermsEnum only takes an immutable, compiled form of the 
automaton (essentially a sorted transitions array).

the query computes this compiled form (or any other simpler rewritten form) in 
its ctor.


> automaton termsenum bug when running with multithreaded search
> --
>
> Key: LUCENE-2907
> URL: https://issues.apache.org/jira/browse/LUCENE-2907
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-2907.patch, LUCENE-2907_repro.patch, 
> correct_seeks.txt, incorrect_seeks.txt, seeks_diff.txt
>
>
> This one popped in hudson (with a test that runs the same query against 
> fieldcache, and with a filter rewrite, and compares results)
> However, its actually worse and unrelated to the fieldcache: you can set both 
> to filter rewrite and it will still fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2907) automaton termsenum bug when running with multithreaded search

2011-02-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2907:


Summary: automaton termsenum bug when running with multithreaded search  
(was: termsenum bug when running with multithreaded search)

editing description so its not confusing, sorry :)

> automaton termsenum bug when running with multithreaded search
> --
>
> Key: LUCENE-2907
> URL: https://issues.apache.org/jira/browse/LUCENE-2907
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-2907_repro.patch, correct_seeks.txt, 
> incorrect_seeks.txt, seeks_diff.txt
>
>
> This one popped in hudson (with a test that runs the same query against 
> fieldcache, and with a filter rewrite, and compares results)
> However, its actually worse and unrelated to the fieldcache: you can set both 
> to filter rewrite and it will still fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [HUDSON] Lucene-Solr-tests-only-3.x - Build # 4555 - Failure

2011-02-06 Thread Michael McCandless
I think this is happening because of LUCENE-1540...

Mike

On Sun, Feb 6, 2011 at 5:25 AM, Apache Hudson Server
 wrote:
> Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/4555/
>
> 1 tests failed.
> REGRESSION:  
> org.apache.lucene.benchmark.byTask.feeds.TrecContentSourceTest.testTrecFeedDirAllTypes
>
> Error Message:
> expected: but was:
>
> Stack Trace:
>        at 
> org.apache.lucene.benchmark.byTask.feeds.TrecContentSourceTest.assertDocData(TrecContentSourceTest.java:70)
>        at 
> org.apache.lucene.benchmark.byTask.feeds.TrecContentSourceTest.testTrecFeedDirAllTypes(TrecContentSourceTest.java:369)
>        at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1045)
>        at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:977)
>
>
>
>
> Build Log (for compile errors):
> [...truncated 6504 lines...]
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1540) Improvements to contrib.benchmark for TREC collections

2011-02-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991145#comment-12991145
 ] 

Michael McCandless commented on LUCENE-1540:


I think this commit has caused a failure on at least 3.x?
{noformat}
[junit] Testcase: 
testTrecFeedDirAllTypes(org.apache.lucene.benchmark.byTask.feeds.TrecContentSourceTest):
  Caused an ERROR
[junit] expected: but was:
[junit] at 
org.apache.lucene.benchmark.byTask.feeds.TrecContentSourceTest.assertDocData(TrecContentSourceTest.java:70)
[junit] at 
org.apache.lucene.benchmark.byTask.feeds.TrecContentSourceTest.testTrecFeedDirAllTypes(TrecContentSourceTest.java:369)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1045)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:977)
[junit] 
[junit] 
[junit] Tests run: 6, Failures: 0, Errors: 1, Time elapsed: 0.488 sec
[junit] 
[junit] - Standard Error -
[junit] WARNING: test method: 'testBadDate' left thread running: 
Thread[Thread-6,5,main]
[junit] RESOURCE LEAK: test method: 'testBadDate' left 1 thread(s) running
[junit] NOTE: reproduce with: ant test -Dtestcase=TrecContentSourceTest 
-Dtestmethod=testBadDate -Dtests.seed=-1485993969467368126:6510043524258948665 
-Dtests.multiplier=5
[junit] NOTE: reproduce with: ant test -Dtestcase=TrecContentSourceTest 
-Dtestmethod=testTrecFeedDirAllTypes 
-Dtests.seed=-1485993969467368126:-9055415333820766139 -Dtests.multiplier=5
[junit] NOTE: test params are: locale=tr_TR, timezone=Europe/Zagreb
[junit] NOTE: all tests run in this JVM:
[junit] [TrecContentSourceTest]
[junit] NOTE: FreeBSD 8.2-RC2 amd64/Sun Microsystems Inc. 1.6.0 
(64-bit)/cpus=16,threads=1,free=66439840,total=86376448
[junit] -  ---
{noformat}

> Improvements to contrib.benchmark for TREC collections
> --
>
> Key: LUCENE-1540
> URL: https://issues.apache.org/jira/browse/LUCENE-1540
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Tim Armstrong
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-1540.patch, LUCENE-1540.patch, LUCENE-1540.patch, 
> LUCENE-1540.patch, trecdocs.zip
>
>
> The benchmarking utilities for  TREC test collections (http://trec.nist.gov) 
> are quite limited and do not support some of the variations in format of 
> older TREC collections.  
> I have been doing some benchmarking work with Lucene and have had to modify 
> the package to support:
> * Older TREC document formats, which the current parser fails on due to 
> missing document headers.
> * Variations in query format - newlines after  tag causing the query 
> parser to get confused.
> * Ability to detect and read in uncompressed text collections
> * Storage of document numbers by default without storing full text.
> I can submit a patch if there is interest, although I will probably want to 
> write unit tests for the new functionality first.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON-MAVEN] Lucene-Solr-Maven-3.x #16: POMs out of sync

2011-02-06 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-3.x/16/

No tests ran.

Build Log (for compile errors):
[...truncated 8390 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-3.x - Build # 4555 - Failure

2011-02-06 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/4555/

1 tests failed.
REGRESSION:  
org.apache.lucene.benchmark.byTask.feeds.TrecContentSourceTest.testTrecFeedDirAllTypes

Error Message:
expected: but was:

Stack Trace:
at 
org.apache.lucene.benchmark.byTask.feeds.TrecContentSourceTest.assertDocData(TrecContentSourceTest.java:70)
at 
org.apache.lucene.benchmark.byTask.feeds.TrecContentSourceTest.testTrecFeedDirAllTypes(TrecContentSourceTest.java:369)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1045)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:977)




Build Log (for compile errors):
[...truncated 6504 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2907) termsenum bug when running with multithreaded search

2011-02-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991141#comment-12991141
 ] 

Uwe Schindler commented on LUCENE-2907:
---

Yes the numbered states cache was always bugging me. But at least it is now 
synchronized (it was not even that at the beginning). I think the problem may 
be that the parallel queries doing different segments with different numbered 
states at the same time.

+1 to remove the cache and calculate on ctor, then its really stateless!

> termsenum bug when running with multithreaded search
> 
>
> Key: LUCENE-2907
> URL: https://issues.apache.org/jira/browse/LUCENE-2907
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-2907_repro.patch, correct_seeks.txt, 
> incorrect_seeks.txt, seeks_diff.txt
>
>
> This one popped in hudson (with a test that runs the same query against 
> fieldcache, and with a filter rewrite, and compares results)
> However, its actually worse and unrelated to the fieldcache: you can set both 
> to filter rewrite and it will still fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2907) termsenum bug when running with multithreaded search

2011-02-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991140#comment-12991140
 ] 

Robert Muir commented on LUCENE-2907:
-

in combination with other things. in my opinion the problem is the cache in 
getNumberedStates.

But the real solution (in my opinion) is to clean up all this crap so the 
termsenum only
takes a completely immutable view of what it needs and for the Query to compile 
once in its ctor,
and remove any stupid caching. 

So, this is what I am working on now.


> termsenum bug when running with multithreaded search
> 
>
> Key: LUCENE-2907
> URL: https://issues.apache.org/jira/browse/LUCENE-2907
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-2907_repro.patch, correct_seeks.txt, 
> incorrect_seeks.txt, seeks_diff.txt
>
>
> This one popped in hudson (with a test that runs the same query against 
> fieldcache, and with a filter rewrite, and compares results)
> However, its actually worse and unrelated to the fieldcache: you can set both 
> to filter rewrite and it will still fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2907) termsenum bug when running with multithreaded search

2011-02-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991137#comment-12991137
 ] 

Uwe Schindler commented on LUCENE-2907:
---

A bug in automaton that only hapoens in multi-threaded? So its the cache there?

> termsenum bug when running with multithreaded search
> 
>
> Key: LUCENE-2907
> URL: https://issues.apache.org/jira/browse/LUCENE-2907
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-2907_repro.patch, correct_seeks.txt, 
> incorrect_seeks.txt, seeks_diff.txt
>
>
> This one popped in hudson (with a test that runs the same query against 
> fieldcache, and with a filter rewrite, and compares results)
> However, its actually worse and unrelated to the fieldcache: you can set both 
> to filter rewrite and it will still fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2907) termsenum bug when running with multithreaded search

2011-02-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991133#comment-12991133
 ] 

Robert Muir commented on LUCENE-2907:
-

bq. Have you found out what happens or where a thread-safety issue could be?

Yes, i found the bug... unfortunately it is actually my automaton problem :(
I will create a nice patch today.

bq. The information on this issue is too small, there seems to be lots of 
IRC/GTalk communication in parallel.

what do you mean? mike was working a long time on the bug, but quickly had to 
stop working on it, so he emailed me all of his state. I took over from there 
for a while, and i opened this issue with my debugging... though I didn't have 
much time to work on it yesterday (only like 1 hour), because I already had 
plans.

I tried to be completely open and dump all of my state/debugging 
information/brainstorming on this JIRA issue, but it only resulted in me 
reporting misleading and confusing information... so I think the information on 
this issue is actually too much?


> termsenum bug when running with multithreaded search
> 
>
> Key: LUCENE-2907
> URL: https://issues.apache.org/jira/browse/LUCENE-2907
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-2907_repro.patch, correct_seeks.txt, 
> incorrect_seeks.txt, seeks_diff.txt
>
>
> This one popped in hudson (with a test that runs the same query against 
> fieldcache, and with a filter rewrite, and compares results)
> However, its actually worse and unrelated to the fieldcache: you can set both 
> to filter rewrite and it will still fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2609) Generate jar containing test classes.

2011-02-06 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991132#comment-12991132
 ] 

Shai Erera commented on LUCENE-2609:


Thanks Steven !

Committed revision 1067623 (3x).

Merging to trunk now ...

> Generate jar containing test classes.
> -
>
> Key: LUCENE-2609
> URL: https://issues.apache.org/jira/browse/LUCENE-2609
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.2
>Reporter: Drew Farris
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2609.patch, LUCENE-2609.patch, LUCENE-2609.patch, 
> LUCENE-2609.patch, LUCENE-2609.patch, LUCENE-2609.patch, LUCENE-2609.patch, 
> LUCENE-2609.patch, LUCENE-2609.patch, LUCENE-2609.patch
>
>
> The test classes are useful for writing unit tests for code external to the 
> Lucene project. It would be helpful to build a jar of these classes and 
> publish them as a maven dependency.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2907) termsenum bug when running with multithreaded search

2011-02-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991130#comment-12991130
 ] 

Uwe Schindler commented on LUCENE-2907:
---

Have you found out what happens or where a thread-safety issue could be? Each 
thread and each query should have its own TermsEnum! Is there maybe a cache on 
codec-level involved? At least there are no multiple-instance (static) caches 
in the  search-side of the TermsEnums, so there must be a multi-threading issue 
in the underlying SegmentReaders.

To conclude: At least Robert found out that FieldCacheTermsEnum always works 
correct? Is this true? The information on this issue is too small, there seems 
to be lots of IRC/GTalk communication in parallel.

> termsenum bug when running with multithreaded search
> 
>
> Key: LUCENE-2907
> URL: https://issues.apache.org/jira/browse/LUCENE-2907
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-2907_repro.patch, correct_seeks.txt, 
> incorrect_seeks.txt, seeks_diff.txt
>
>
> This one popped in hudson (with a test that runs the same query against 
> fieldcache, and with a filter rewrite, and compares results)
> However, its actually worse and unrelated to the fieldcache: you can set both 
> to filter rewrite and it will still fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org