[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-06-01 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287213#comment-13287213
 ] 

Eks Dev commented on LUCENE-3312:
-

bq. My assumption is that StoredField-s will never be used anymore as potential 
sources of token streams?

One case where it might make sense are scenarios where a user wants to store 
analyzed field (not original) and later to to read it as TokenStream. Kind of 
TermVector without tf. I think I remember seing great patch with 
indexable-storable field (with serialization and deserialization).

A user can do it in two passes, but sumetimes it is a not chep to analyze two 
times



> Break out StorableField from IndexableField
> ---
>
> Key: LUCENE-3312
> URL: https://issues.apache.org/jira/browse/LUCENE-3312
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Nikola Tankovic
>  Labels: gsoc2012, lucene-gsoc-12
> Fix For: Field Type branch
>
> Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
> lucene-3312-patch-03.patch, lucene-3312-patch-04.patch
>
>
> In the field type branch we have strongly decoupled
> Document/Field/FieldType impl from the indexer, by having only a
> narrow API (IndexableField) passed to IndexWriter.  This frees apps up
> use their own "documents" instead of the "user-space" impls we provide
> in oal.document.
> Similarly, with LUCENE-3309, we've done the same thing on the
> doc/field retrieval side (from IndexReader), with the
> StoredFieldsVisitor.
> But, maybe we should break out StorableField from IndexableField,
> such that when you index a doc you provide two Iterables -- one for the
> IndexableFields and one for the StorableFields.  Either can be null.
> One downside is possible perf hit for fields that are both indexed &
> stored (ie, we visit them twice, lookup their name in a hash twice,
> etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



looking for a BooleanMatcher instead of BooleanScorer

2012-06-01 Thread Li Li
hi all,
I am looking for a 'BooleanMatcher' in lucene. for many
application, we don't need order matched documents by relevant scores.
we just like the boolean query. But the BooleanScorer/BooleanScorer2
is a little bit heavy for the purpose of relevant scoring.
one use case is: we have some fields which has very small number
of tokens(usually only one word). such as id,tag or something else.
But we need query like this: id in (1,3,5.). if using
booleanQuery (id:1 id:3 id:5 ...). BooleanScorer can only apply to 31
terms. BooleanScorer2 using priority queue to know how many terms are
matched(Coord).
Filters may help but it can be a very complicated query(or else,
it self still using BooleanQuery, there is a recursive problem)

we may divide current BooleanScorer to a BooleanMatcher and a
Ranker. if we need score the hitted docs, we ask the BooleanScorer for
not only hitted id but also tf/idf coord or anything we need to use in
ranking. but sometimes we only need docIds. then the BooleanMatcher
can optimize it's implementation. for the case of many disjunction
terms, we can do it like Filter or BooleanScorer instead of
BooleanScorer2.

is it possible?

following is some user demands I searched from the mail list. the
first one is my own requirement.

1. https://github.com/neo4j/community/issues/494

2. mail to lucene

qibaoy...@126.com qibaoy...@126.com via lucene.apache.org

May 6

to lucene
Hi,
  I met a problem about how to search many keywords  in about
5,000,000 documents.For example the query may be like "(a1 or a2 or a3
a200) and (b1 or b2 or b3 or b4 . b400)",I found it will take
vey long time(40seconds) to get the the answer in only one field(Title
field),and JVM will throw OutMemory error in more fields(title field
plus content field).Any suggestions or good idea to solve this
problem?thanks in advance.


   3 mail to lucene
Chris Book chrisb...@gmail.com via lucene.apache.org

Apr 11

to solr-user
Hello, I have a solr index running that is working very well as a search.
 But I want to add the ability (if possible) to use it to do matching.  The
problem is that by default it is only looking for all the input terms to be
present, and it doesn't give me any indication as to how many terms in the
target field were not specified by the input.

For example, if I'm trying to match to the song title "dust in the wind",
I'm correctly getting a match if the input query is "dust in wind".  But I
don't want to get a match if the input is just "dust".  Although as a
search "dust" should return this result, I'm looking for some way to filter
this out based on some indication that the input isn't close enough to the
output.  Perhaps if I could get information that that the number of input
terms is much less than the number of terms in the field.  Or something
else along those line?

I realize that this isn't the typical use case for a search, but I'm just
looking for some suggestions as to how I could improve the above example a
bit.

Thanks,
Chris

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3488) Create a Collections API for SolrCloud

2012-06-01 Thread Tommaso Teofili (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili updated SOLR-3488:
--

Attachment: SOLR-3488_2.patch

slight improvements to Mark's patch.
Regarding the template based creation I think it should use a different 
parameter name for the collection template (e.g. "template") and use the 
"collection" parameter for the new collection name.
Apart from that I think it may be useful to clearly define different creation 
strategies (I've created an interface for that), the right one is chosen on the 
basis of the passed HTTP parameters.


> Create a Collections API for SolrCloud
> --
>
> Key: SOLR-3488
> URL: https://issues.apache.org/jira/browse/SOLR-3488
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
> Attachments: SOLR-3488.patch, SOLR-3488_2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

2012-06-01 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287255#comment-13287255
 ] 

Chris Male commented on LUCENE-3976:


Hi Luca,

I'm unsure about this approach.  What other kind of Exceptions can be thrown 
other than IOExceptions? I think we should explore what those possible errors 
are and fix them at their source, to provide targeted Exceptions.  If there is 
a problem parsing, then we should thrown a ParseException with the line number 
causing the problem.

> Improve error messages for unsupported Hunspell formats
> ---
>
> Key: LUCENE-3976
> URL: https://issues.apache.org/jira/browse/LUCENE-3976
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3976.patch, LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge 
> variety of formats that are out there, especially since our impl is based on 
> papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it 
> difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were 
> unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-06-01 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287258#comment-13287258
 ] 

Mark Harwood commented on LUCENE-4069:
--

I've thought some more about option 2 (PerFieldPF reusing wrapped PFs) and it 
looks to get very ugly very quickly.
There's only so much PerFieldPF can do to rationalize a random jumble of PF 
instances presented to it by clients. I think the right place to draw the line 
is Lucene-4093 i.e. a simple .equals() comparison on top-level PFs to eliminate 
any duplicates. Any other approach that also tries to de-dup nested PFs looks 
to be adding a lot of complexity, especially when you consider what that does 
to the model of read-time object instantiation. This would be significant added 
complexity to solve a problem you have already suggested is insignificant (i.e. 
too many files doesn't really matter when using CFS).

I can remove the per-field stuff from BloomPF if you want but I imagine I will 
routinely subclass it to add this optimisation back in to my apps.




> Segment-level Bloom filters for a 2 x speed up on rare term searches
> 
>
> Key: LUCENE-4069
> URL: https://issues.apache.org/jira/browse/LUCENE-4069
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 3.6, 4.0
>Reporter: Mark Harwood
>Priority: Minor
> Fix For: 4.0, 3.6.1
>
> Attachments: BloomFilterPostings40.patch, 
> MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip
>
>
> An addition to each segment which stores a Bloom filter for selected fields 
> in order to give fast-fail to term searches, helping avoid wasted disk access.
> Best suited for low-frequency fields e.g. primary keys on big indexes with 
> many segments but also speeds up general searching in my tests.
> Overview slideshow here: 
> http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
> Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
> Patch based on 3.6 codebase attached.
> There are no 3.6 API changes currently - to play just add a field with "_blm" 
> on the end of the name to invoke special indexing/querying capability. 
> Clearly a new Field or schema declaration(!) would need adding to APIs to 
> configure the service properly.
> Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4083) RateLimited.pause() throws unchecked exception

2012-06-01 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  resolved LUCENE-4083.
---

Resolution: Cannot Reproduce

I can't reproduce this, and anyway very likely it was just a test framework 
trying to interrupt all leftover threads.

> RateLimited.pause() throws unchecked exception
> --
>
> Key: LUCENE-4083
> URL: https://issues.apache.org/jira/browse/LUCENE-4083
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/store
>Affects Versions: 4.0
>Reporter: Andrzej Bialecki 
> Fix For: 4.0
>
>
> The while() loop in RateLimiter.pause() invokes Thread.sleep() with 
> potentially large values, which occasionally results in InterruptedException 
> being thrown from Thread.sleep(). This is wrapped in an unchecked 
> ThreadInterruptedException and re-thrown, and results in high-level errors 
> like this:
> {code}
> [junit] 2012-05-29 15:50:15,464 ERROR core.SolrCore - 
> org.apache.lucene.util.ThreadInterruptedException: 
> java.lang.InterruptedException: sleep interrupted
> [junit]   at 
> org.apache.lucene.store.RateLimiter.pause(RateLimiter.java:82)
> [junit]   at 
> org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:82)
> [junit]   at 
> org.apache.lucene.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:73)
> [junit]   at 
> org.apache.lucene.store.DataOutput.writeVInt(DataOutput.java:191)
> [junit]   at 
> org.apache.lucene.codecs.lucene40.Lucene40PostingsWriter.addPosition(Lucene40PostingsWriter.java:237)
> [junit]   at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:519)
> [junit]   at 
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:92)
> [junit]   at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
> [junit]   at 
> org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
> [junit]   at 
> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
> [junit]   at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:475)
> [junit]   at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
> [junit]   at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:553)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2416)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2548)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2530)
> [junit]   at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:414)
> [junit]   at 
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
> {code}
> I believe this is a bug - the while() loop already ensures that the total 
> time spent in pause() is correct even if InterruptedException-s are thrown, 
> so they should not be re-thrown.
> The patch is trivial - simply don't re-throw.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

2012-06-01 Thread Luca Cavanna (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287260#comment-13287260
 ] 

Luca Cavanna commented on LUCENE-3976:
--

Hi Chris, 
I agree with you. On the other hand with the affix rule mentioned, before 
LUCENE-4019 we had an AOE, so the additional catch would have been useful just 
to throw a nicer error message like "Error while parsing the affix file". That 
one has been solved at its source, for now I don't see any other possible 
errors but I'm sure there are some, maybe plenty since we support only a subset 
of the formats and features.
It was just a way to introduce a generic error message but I totally agree that 
the right apporach would be fixing everything at the source.

> Improve error messages for unsupported Hunspell formats
> ---
>
> Key: LUCENE-3976
> URL: https://issues.apache.org/jira/browse/LUCENE-3976
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3976.patch, LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge 
> variety of formats that are out there, especially since our impl is based on 
> papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it 
> difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were 
> unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-06-01 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287267#comment-13287267
 ] 

Andrzej Bialecki  commented on LUCENE-3312:
---

bq. We already have Document and it's going to become confusing with two 
different Document classes

+1 to use a better name (LuceneDocument? AbstractDocument?).

bq. I don't think it should hold Indexable/StorableField instances but instead 
should just hold Field instances.

With the Field class implementing IndexableField and StorableField, and on 
retrieval returning a different class that implements only StorableField? Well, 
at least it would allow for expressing the association between consecutive 
stored/indexed values that we can express now when creating a Document for 
indexing. But the strong decoupling of stored/indexed parts of a field has its 
benefits too (arbitrary sequences of stored/indexed parts of fields)... and if 
you require a specific implementation at the level of (input) Document then you 
prevent users from using their own impls. of strongly decoupled sequences of 
StoredField/IndexedField.

bq. I think I remember seing great patch with indexable-storable field (with 
serialization and deserialization).
SOLR-1535 .

> Break out StorableField from IndexableField
> ---
>
> Key: LUCENE-3312
> URL: https://issues.apache.org/jira/browse/LUCENE-3312
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Nikola Tankovic
>  Labels: gsoc2012, lucene-gsoc-12
> Fix For: Field Type branch
>
> Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
> lucene-3312-patch-03.patch, lucene-3312-patch-04.patch
>
>
> In the field type branch we have strongly decoupled
> Document/Field/FieldType impl from the indexer, by having only a
> narrow API (IndexableField) passed to IndexWriter.  This frees apps up
> use their own "documents" instead of the "user-space" impls we provide
> in oal.document.
> Similarly, with LUCENE-3309, we've done the same thing on the
> doc/field retrieval side (from IndexReader), with the
> StoredFieldsVisitor.
> But, maybe we should break out StorableField from IndexableField,
> such that when you index a doc you provide two Iterables -- one for the
> IndexableFields and one for the StorableFields.  Either can be null.
> One downside is possible perf hit for fields that are both indexed &
> stored (ie, we visit them twice, lookup their name in a hash twice,
> etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: looking for a BooleanMatcher instead of BooleanScorer

2012-06-01 Thread Tanguy Moal
Hello,

I'm just sharing my thoughts, they might be off-topic...

Take the first example quoted from github : the user wants to find all
nodes having their facebookId in a given quite long list ( a friends list,
be aware that some facebook users have 1500+ friends!).

The application firstly had the facebookId for a user (say id=someId), and
requested the facebook graph with that id and got a quite long list of
facebookIds back, right?
At that time, I think the application should not try to enumerate its neo4j
graph using a OR-ed facebookIds list.
It should make sure that each neo4j node in set of the friends list has a
"friendOf" attribute and ensure that this multivalued attribute contains
the facebookId : someId for each involved node. Trigger an update request
of those updated nodes.
You could make your application wait for that update to complete if it
really needs to be synchronous with facebook.
That moves the problem to handling update request smartly which might be
easier sometimes.
Here you will eventually want to store a hash the user's friendslist
somewhere in the user's node so you know in advance if that user's friends
list has changed and if you need to trigger the update process again (just
thinking).
When your user uses the application for the first time, or every time after
she updated her friends list, an update job will be fired for that user.
You may want to wait for update request to complete only the first time (if
you don't need your app to be 100% synchronized with facebook), and make
the subsequent jobs be queued to something handling these updates
efficiently.  That could stress the storage system with intensive writes
from times to times, especially at the beginning but that will converge to
a mainly read-based application after most active user has used the
application once. New friendships aren't that frequent (IMHO).
May by NRT developments could be used in this scenario... I don't know much
more. I don't know anything about how Neo4J works, I used it once, that's
all.
Anyway if you hit writes issues, congratulations your application is being
used widely, go buy SSD disks :)

Finally, you will then enumerate your nodes with a very quick and efficient
query friendOf:"someId" .


What I wanted to mean is that if your application really needs to perform
queries made of many, many, many, ... really many terms that are OR-ed,
then there might exist (but it's not always true) a different design of
your data model that could allow you to still fit the use case of a search
engine.

This applies to 1 and may be to 2 too. ( :p 2-2-2 -- never mind )

I don't really understand for 3 which seems to be a MinShouldMatch issue.

As I said in the beginning, I'm simply sharing my thoughts! I hope this
helps...

--
Tanguy

2012/6/1 Li Li 

> hi all,
>I am looking for a 'BooleanMatcher' in lucene. for many
> application, we don't need order matched documents by relevant scores.
> we just like the boolean query. But the BooleanScorer/BooleanScorer2
> is a little bit heavy for the purpose of relevant scoring.
>one use case is: we have some fields which has very small number
> of tokens(usually only one word). such as id,tag or something else.
>But we need query like this: id in (1,3,5.). if using
> booleanQuery (id:1 id:3 id:5 ...). BooleanScorer can only apply to 31
> terms. BooleanScorer2 using priority queue to know how many terms are
> matched(Coord).
>Filters may help but it can be a very complicated query(or else,
> it self still using BooleanQuery, there is a recursive problem)
>
>we may divide current BooleanScorer to a BooleanMatcher and a
> Ranker. if we need score the hitted docs, we ask the BooleanScorer for
> not only hitted id but also tf/idf coord or anything we need to use in
> ranking. but sometimes we only need docIds. then the BooleanMatcher
> can optimize it's implementation. for the case of many disjunction
> terms, we can do it like Filter or BooleanScorer instead of
> BooleanScorer2.
>
>is it possible?
>
>following is some user demands I searched from the mail list. the
> first one is my own requirement.
>
>1. https://github.com/neo4j/community/issues/494
>
>2. mail to lucene
>
> qibaoy...@126.com qibaoy...@126.com via lucene.apache.org
>
> May 6
>
> to lucene
> Hi,
>  I met a problem about how to search many keywords  in about
> 5,000,000 documents.For example the query may be like "(a1 or a2 or a3
> a200) and (b1 or b2 or b3 or b4 . b400)",I found it will take
> vey long time(40seconds) to get the the answer in only one field(Title
> field),and JVM will throw OutMemory error in more fields(title field
> plus content field).Any suggestions or good idea to solve this
> problem?thanks in advance.
>
>
>   3 mail to lucene
> Chris Book chrisb...@gmail.com via lucene.apache.org
>
> Apr 11
>
> to solr-user
> Hello, I have a solr index running that is working very well as a search.
>  But I want to add the ability (if possible) t

Re: looking for a BooleanMatcher instead of BooleanScorer

2012-06-01 Thread Li Li
sorry, the first problem is not mine.

On Fri, Jun 1, 2012 at 4:58 PM, Tanguy Moal  wrote:
> Hello,
>
> I'm just sharing my thoughts, they might be off-topic...
>
> Take the first example quoted from github : the user wants to find all nodes
> having their facebookId in a given quite long list ( a friends list, be
> aware that some facebook users have 1500+ friends!).
>
> The application firstly had the facebookId for a user (say id=someId), and
> requested the facebook graph with that id and got a quite long list of
> facebookIds back, right?
> At that time, I think the application should not try to enumerate its neo4j
> graph using a OR-ed facebookIds list.
> It should make sure that each neo4j node in set of the friends list has a
> "friendOf" attribute and ensure that this multivalued attribute contains the
> facebookId : someId for each involved node. Trigger an update request of
> those updated nodes.
> You could make your application wait for that update to complete if it
> really needs to be synchronous with facebook.
> That moves the problem to handling update request smartly which might be
> easier sometimes.
> Here you will eventually want to store a hash the user's friendslist
> somewhere in the user's node so you know in advance if that user's friends
> list has changed and if you need to trigger the update process again (just
> thinking).
> When your user uses the application for the first time, or every time after
> she updated her friends list, an update job will be fired for that user. You
> may want to wait for update request to complete only the first time (if you
> don't need your app to be 100% synchronized with facebook), and make the
> subsequent jobs be queued to something handling these updates
> efficiently.  That could stress the storage system with intensive writes
> from times to times, especially at the beginning but that will converge to a
> mainly read-based application after most active user has used the
> application once. New friendships aren't that frequent (IMHO).
> May by NRT developments could be used in this scenario... I don't know much
> more. I don't know anything about how Neo4J works, I used it once, that's
> all.
> Anyway if you hit writes issues, congratulations your application is being
> used widely, go buy SSD disks :)
>
> Finally, you will then enumerate your nodes with a very quick and efficient
> query friendOf:"someId" .
>
>
> What I wanted to mean is that if your application really needs to perform
> queries made of many, many, many, ... really many terms that are OR-ed, then
> there might exist (but it's not always true) a different design of your data
> model that could allow you to still fit the use case of a search engine.

I agree. Lucene/solr may need support many other types of query used
in traditional database.
for now, we usually store structured data in rdbms and full text in
lucene/solr. But the
synchronization of data is a nightmare.  we like just use one full
featured solution instead of
integrating many solutions.


>
> This applies to 1 and may be to 2 too. ( :p 2-2-2 -- never mind )
>
> I don't really understand for 3 which seems to be a MinShouldMatch issue.
>
> As I said in the beginning, I'm simply sharing my thoughts! I hope this
> helps...
>
> --
> Tanguy
>
> 2012/6/1 Li Li 
>>
>> hi all,
>>    I am looking for a 'BooleanMatcher' in lucene. for many
>> application, we don't need order matched documents by relevant scores.
>> we just like the boolean query. But the BooleanScorer/BooleanScorer2
>> is a little bit heavy for the purpose of relevant scoring.
>>    one use case is: we have some fields which has very small number
>> of tokens(usually only one word). such as id,tag or something else.
>>    But we need query like this: id in (1,3,5.). if using
>> booleanQuery (id:1 id:3 id:5 ...). BooleanScorer can only apply to 31
>> terms. BooleanScorer2 using priority queue to know how many terms are
>> matched(Coord).
>>    Filters may help but it can be a very complicated query(or else,
>> it self still using BooleanQuery, there is a recursive problem)
>>
>>    we may divide current BooleanScorer to a BooleanMatcher and a
>> Ranker. if we need score the hitted docs, we ask the BooleanScorer for
>> not only hitted id but also tf/idf coord or anything we need to use in
>> ranking. but sometimes we only need docIds. then the BooleanMatcher
>> can optimize it's implementation. for the case of many disjunction
>> terms, we can do it like Filter or BooleanScorer instead of
>> BooleanScorer2.
>>
>>    is it possible?
>>
>>    following is some user demands I searched from the mail list. the
>> first one is my own requirement.
>>
>>    1. https://github.com/neo4j/community/issues/494
>>
>>    2. mail to lucene
>>
>> qibaoy...@126.com qibaoy...@126.com via lucene.apache.org
>>
>> May 6
>>
>> to lucene
>> Hi,
>>      I met a problem about how to search many keywords  in about
>> 5,000,000 documents.For example the query may be like "(a1 or 

[jira] [Commented] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

2012-06-01 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287286#comment-13287286
 ] 

Chris Male commented on LUCENE-3976:


Hi Luca,

I think I'm going to close this and instead we can tackle this on a per-error 
basis.

> Improve error messages for unsupported Hunspell formats
> ---
>
> Key: LUCENE-3976
> URL: https://issues.apache.org/jira/browse/LUCENE-3976
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3976.patch, LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge 
> variety of formats that are out there, especially since our impl is based on 
> papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it 
> difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were 
> unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

2012-06-01 Thread Luca Cavanna (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287290#comment-13287290
 ] 

Luca Cavanna commented on LUCENE-3976:
--

Ok, that's fine!

> Improve error messages for unsupported Hunspell formats
> ---
>
> Key: LUCENE-3976
> URL: https://issues.apache.org/jira/browse/LUCENE-3976
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3976.patch, LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge 
> variety of formats that are out there, especially since our impl is based on 
> papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it 
> difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were 
> unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

2012-06-01 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male resolved LUCENE-3976.


Resolution: Won't Fix
  Assignee: Chris Male

We will tackle error messages on a per-error basis, thanks for your help none 
the less Luca.

> Improve error messages for unsupported Hunspell formats
> ---
>
> Key: LUCENE-3976
> URL: https://issues.apache.org/jira/browse/LUCENE-3976
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
>Assignee: Chris Male
> Attachments: LUCENE-3976.patch, LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge 
> variety of formats that are out there, especially since our impl is based on 
> papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it 
> difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were 
> unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-06-01 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287294#comment-13287294
 ] 

Chris Male commented on LUCENE-3312:


bq. With the Field class implementing IndexableField and StorableField, and on 
retrieval returning a different class that implements only StorableField?

Yes, Nikola has included a StoredDocument class for that.  This would prevent 
users from thinking they can just take a search result and pass it into being 
indexed.  It creates a clear separation between indexing and search results.

bq. But the strong decoupling of stored/indexed parts of a field has its 
benefits too (arbitrary sequences of stored/indexed parts of fields)... and if 
you require a specific implementation at the level of (input) Document then you 
prevent users from using their own impls. of strongly decoupled sequences of 
StoredField/IndexedField.

I agree that there are benefits to the decoupling.  It's just that one of the 
important factors in this issue and other work in and around Document & Field 
is creating a cleaner API for users.  I'm not sure bogging the 
document.Document API down with having to manage both Storable and 
IndexableField instances is worth it.  Field is already basically a parent 
class with the extensive list of specializations we now have.

I'm wondering whether expert users who are using their own 
Storable/IndexableField impls will also want their own 'Document' impls as 
well, maybe to support direct streaming of fields or something.  If we enforce 
this, then we're have a consistent policy that to use these expert interfaces, 
you're going to have to provide your own implementations for everything.

With all that said, I'm open to a clean API in Document that can do everything 
:)

> Break out StorableField from IndexableField
> ---
>
> Key: LUCENE-3312
> URL: https://issues.apache.org/jira/browse/LUCENE-3312
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Nikola Tankovic
>  Labels: gsoc2012, lucene-gsoc-12
> Fix For: Field Type branch
>
> Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
> lucene-3312-patch-03.patch, lucene-3312-patch-04.patch
>
>
> In the field type branch we have strongly decoupled
> Document/Field/FieldType impl from the indexer, by having only a
> narrow API (IndexableField) passed to IndexWriter.  This frees apps up
> use their own "documents" instead of the "user-space" impls we provide
> in oal.document.
> Similarly, with LUCENE-3309, we've done the same thing on the
> doc/field retrieval side (from IndexReader), with the
> StoredFieldsVisitor.
> But, maybe we should break out StorableField from IndexableField,
> such that when you index a doc you provide two Iterables -- one for the
> IndexableFields and one for the StorableFields.  Either can be null.
> One downside is possible perf hit for fields that are both indexed &
> stored (ie, we visit them twice, lookup their name in a hash twice,
> etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2723 - Failure

2012-06-01 Thread Michael McCandless
On Thu, May 31, 2012 at 8:05 PM, Robert Muir  wrote:
> On Thu, May 31, 2012 at 5:51 PM, Michael McCandless
>  wrote:
>> I think the best option is to ignore the OOME from this test case...?
>>
>> Mike McCandless
>>
>
> I think thats fine for now, but I'm not convinced there is no problem
> at all. However, its not obvious the problem is us, either.
>
> Its easy to see this OOM is related to G1 garbage collector.
>
> This test has failed 3 times in the past couple days (before it never
> failed: i suspect packed ints changes sent it over the edge).
>
> https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2707/
> https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2719/
> https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2723/
>
> All 3 cases are java 7, and all 3 cases uses -XX:+UseG1GC. (Uwe turned
> on GC randomization at lucene revolution)

Aha!  Nice sleuthing :)

So maybe this means G1 isn't as good when heap is limited...

Can we somehow detect / pass property to the JVM when G1 is in use?
Then we can scope down the "ignore OOME" I committed to only when G1
is in use...

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-06-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287364#comment-13287364
 ] 

Michael McCandless commented on LUCENE-3312:


I think I like this decoupling ... for normal users I don't think this
makes the API harder?  They still work with TextField, FloatField,
StoredField, etc.?  It's just that, under the hood, these sugar classes
extend from the right base class (indexed or stored).

Document.add is just type overloaded, but Document.get* will get
messier: we'll need getStored and getIndexed?  I guess that would be
simpler if Document could just store Field instances... hmm.

It would also be less invasive change for migrating from 4.0 -> 5.0
(assuming this issue is done only for 5.0...) if we didn't do the hard
split else we need a back-compat story...

{quote}
bq. We already have Document and it's going to become confusing with two 
different Document classes

+1 to use a better name (LuceneDocument? AbstractDocument?).
{quote}

Maybe IndexDocument?  I think it's OK as an interface if we mark it
@lucene.internal?  This is the raw, super expert low-level that indexer
uses to consume documents... it has only 2 methods, and I think for
expert users it could be a hassle if we force the impl to inherit from
our base class...

Should StoredDocument (returned from IR.document) be "read only"?  Like
you can iterate its fields, look them up, etc., but not eg remove them?

We should probably rename document.Field -> document.IndexedField and
document.Field -> document.IndexedFieldType?

Also I think we should rename XXXField.TYPE_UNSTORED -> .TYPE, since in
each case there's only 1 TYPE instance for that sugar field?

Separately, I think even for 4.0 we should remove XXXField.TYPE_STORED
from all the sugar fields (TextField, StringField, etc.); expert users
can always make a custom Indexable/Storable/Field/FieldType that both
stores & indexes...


> Break out StorableField from IndexableField
> ---
>
> Key: LUCENE-3312
> URL: https://issues.apache.org/jira/browse/LUCENE-3312
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Nikola Tankovic
>  Labels: gsoc2012, lucene-gsoc-12
> Fix For: Field Type branch
>
> Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
> lucene-3312-patch-03.patch, lucene-3312-patch-04.patch
>
>
> In the field type branch we have strongly decoupled
> Document/Field/FieldType impl from the indexer, by having only a
> narrow API (IndexableField) passed to IndexWriter.  This frees apps up
> use their own "documents" instead of the "user-space" impls we provide
> in oal.document.
> Similarly, with LUCENE-3309, we've done the same thing on the
> doc/field retrieval side (from IndexReader), with the
> StoredFieldsVisitor.
> But, maybe we should break out StorableField from IndexableField,
> such that when you index a doc you provide two Iterables -- one for the
> IndexableFields and one for the StorableFields.  Either can be null.
> One downside is possible perf hit for fields that are both indexed &
> stored (ie, we visit them twice, lookup their name in a hash twice,
> etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3842) Analyzing Suggester

2012-06-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287369#comment-13287369
 ] 

Michael McCandless commented on LUCENE-3842:


Hi Sudarshan, thanks for raising this ... I'll have a look...

> Analyzing Suggester
> ---
>
> Key: LUCENE-3842
> URL: https://issues.apache.org/jira/browse/LUCENE-3842
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/spellchecker
>Affects Versions: 3.6, 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-3842-TokenStream_to_Automaton.patch, 
> LUCENE-3842.patch, LUCENE-3842.patch, LUCENE-3842.patch, LUCENE-3842.patch, 
> LUCENE-3842.patch, LUCENE-3842.patch, LUCENE-3842.patch
>
>
> Since we added shortest-path wFSA search in LUCENE-3714, and generified the 
> comparator in LUCENE-3801,
> I think we should look at implementing suggesters that have more capabilities 
> than just basic prefix matching.
> In particular I think the most flexible approach is to integrate with 
> Analyzer at both build and query time,
> such that we build a wFST with:
> input: analyzed text such as ghost0christmas0past <-- byte 0 here is an 
> optional token separator
> output: surface form such as "the ghost of christmas past"
> weight: the weight of the suggestion
> we make an FST with PairOutputs, but only do the shortest path 
> operation on the weight side (like
> the test in LUCENE-3801), at the same time accumulating the output (surface 
> form), which will be the actual suggestion.
> This allows a lot of flexibility:
> * Using even standardanalyzer means you can offer suggestions that ignore 
> stopwords, e.g. if you type in "ghost of chr...",
>   it will suggest "the ghost of christmas past"
> * we can add support for synonyms/wdf/etc at both index and query time (there 
> are tradeoffs here, and this is not implemented!)
> * this is a basis for more complicated suggesters such as Japanese 
> suggesters, where the analyzed form is in fact the reading,
>   so we would add a TokenFilter that copies ReadingAttribute into term text 
> to support that...
> * other general things like offering suggestions that are more "fuzzy" like 
> using a plural stemmer or ignoring accents or whatever.
> According to my benchmarks, suggestions are still very fast with the 
> prototype (e.g. ~ 100,000 QPS), and the FST size does not
> explode (its short of twice that of a regular wFST, but this is still far 
> smaller than TST or JaSpell, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2012-06-01 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287380#comment-13287380
 ] 

Chris Male commented on LUCENE-3312:


I am all for the decoupling too, just want to thoroughly kick the tyres on this 
one :D  I dont want another FieldType like discussion.

{quote}
Document.add is just type overloaded, but Document.get* will get
messier: we'll need getStored and getIndexed? I guess that would be
simpler if Document could just store Field instances... hmm.
{quote}

Perhaps if we just limit the API in Document we can handle this okay.  We can 
provide the overloaded add methods, two get methods and 1 remove method.

{quote}
Maybe IndexDocument? I think it's OK as an interface if we mark it
@lucene.internal? This is the raw, super expert low-level that indexer
uses to consume documents... it has only 2 methods, and I think for
expert users it could be a hassle if we force the impl to inherit from
our base class...
{quote}

+1 to both the name and the handling of the interface.

{quote}
Should StoredDocument (returned from IR.document) be "read only"? Like
you can iterate its fields, look them up, etc., but not eg remove them?
{quote}

+1 You shouldn't really need to remove fields, you can achieve that by not 
retrieving them in the first place


> Break out StorableField from IndexableField
> ---
>
> Key: LUCENE-3312
> URL: https://issues.apache.org/jira/browse/LUCENE-3312
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Nikola Tankovic
>  Labels: gsoc2012, lucene-gsoc-12
> Fix For: Field Type branch
>
> Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, 
> lucene-3312-patch-03.patch, lucene-3312-patch-04.patch
>
>
> In the field type branch we have strongly decoupled
> Document/Field/FieldType impl from the indexer, by having only a
> narrow API (IndexableField) passed to IndexWriter.  This frees apps up
> use their own "documents" instead of the "user-space" impls we provide
> in oal.document.
> Similarly, with LUCENE-3309, we've done the same thing on the
> doc/field retrieval side (from IndexReader), with the
> StoredFieldsVisitor.
> But, maybe we should break out StorableField from IndexableField,
> such that when you index a doc you provide two Iterables -- one for the
> IndexableFields and one for the StorableFields.  Either can be null.
> One downside is possible perf hit for fields that are both indexed &
> stored (ie, we visit them twice, lookup their name in a hash twice,
> etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1485) PayloadTermQuery support

2012-06-01 Thread Roland Deck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287384#comment-13287384
 ] 

Roland Deck commented on SOLR-1485:
---

Hi
I tried the PayloadTermQueryPlugin today.
To get the scores as mentioned above I had to change the code a little. 

Here is the relevant code fragment:

  @Override
  public QParser createParser(String qstr, SolrParams localParams, SolrParams 
params, SolrQueryRequest req) {
return new QParser(qstr, localParams, params, req) {
  public Query parse() throws ParseException {
//rdeck: hint: lets try to set includeSpanCore to true. => Yes it 
works! (after having re-indexed all documents)!
return new PayloadTermQuery(
new Term(localParams.get(QueryParsing.F), 
localParams.get(QueryParsing.V)),
createPayloadFunction(localParams.get("func")),
true); //was originally false instead of true
  }
};
  }

with includeSpanCore = false, I get score = payload value
with includeSpanCore = true, the payload takes part on the score calculation

I have some questions left:

1) Why is the PayloadTermQuery limited to just one field? Or will this change?
2) How can I mix up queries containing parts which are payload dependent and 
others which aren't?

> PayloadTermQuery support
> 
>
> Key: SOLR-1485
> URL: https://issues.apache.org/jira/browse/SOLR-1485
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Erik Hatcher
> Attachments: PayloadTermQueryPlugin.java
>
>
> Solr currently has no support for Lucene's PayloadTermQuery, yet it has 
> support for indexing payloads. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4098) Efficient bulk operations for packed integer arrays

2012-06-01 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4098:
-

Attachment: LUCENE-4098.patch

Slightly updated patch that fixes the computation of maxValue in GrowableWriter 
and uses | instead of Math.max to compute the max value when performing a bulk 
set (since GrowableWriter only works with unsigned integers).

As a side note, the reason why I didn't write bulk get and set methods for 
Packed64 is that I didn't find how to do it efficiently (= with less 
instructions and no conditional) without writing specialized methods for every 
bitsPerValue.

> Efficient bulk operations for packed integer arrays
> ---
>
> Key: LUCENE-4098
> URL: https://issues.apache.org/jira/browse/LUCENE-4098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4098.patch
>
>
> There are some places in Lucene code that {iterate over,set} ranges of values 
> of a packed integer array. Because bit-packing implementations (Packed*) tend 
> be slower than direct implementations, this can take a lot of time.
> For example, under some scenarii, GrowableWriter can take most of its 
> (averaged) {{set}} time in resizing operations.
> However, some bit-packing schemes, such as the one that is used by 
> {{Packed64SingleBlock*}}, allow to implement efficient bulk operations such 
> as get/set/fill. Implementing these bulk operations in 
> {{PackedInts.{Reader,Mutable}}} and using them across other components 
> instead of their single-value counterpart could help improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4098) Efficient bulk operations for packed integer arrays

2012-06-01 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4098:
-

Attachment: (was: LUCENE-4098.patch)

> Efficient bulk operations for packed integer arrays
> ---
>
> Key: LUCENE-4098
> URL: https://issues.apache.org/jira/browse/LUCENE-4098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4098.patch
>
>
> There are some places in Lucene code that {iterate over,set} ranges of values 
> of a packed integer array. Because bit-packing implementations (Packed*) tend 
> be slower than direct implementations, this can take a lot of time.
> For example, under some scenarii, GrowableWriter can take most of its 
> (averaged) {{set}} time in resizing operations.
> However, some bit-packing schemes, such as the one that is used by 
> {{Packed64SingleBlock*}}, allow to implement efficient bulk operations such 
> as get/set/fill. Implementing these bulk operations in 
> {{PackedInts.{Reader,Mutable}}} and using them across other components 
> instead of their single-value counterpart could help improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4063) FrenchLightStemmer performs abusive compression of (arbitrary) repeated characters in long tokens

2012-06-01 Thread Tanguy Moal (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287408#comment-13287408
 ] 

Tanguy Moal commented on LUCENE-4063:
-

I agree with both of you, it sounds like a design change.

I think Jacques Savoy's algorithm was intended to be used on words. Not on 
numbers, or mixes of both (like in 22h00).

Which is true for any stemmer, I think. That's why on the mailing I also 
suggested we could have each stemmer share a common interface that would filter 
non-stemmable literals out of the way. That could prevent the same issue to 
raise from a different stemming implementation.

I'm just saying this as I think about it.

> FrenchLightStemmer performs abusive compression of (arbitrary) repeated 
> characters in long tokens
> -
>
> Key: LUCENE-4063
> URL: https://issues.apache.org/jira/browse/LUCENE-4063
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.4, 4.0
>Reporter: Tanguy Moal
>Assignee: Steven Rowe
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-4063.patch, SOLR-3463.patch, SOLR-3463.patch, 
> SOLR-3463.patch
>
>
> FrenchLightStemmer performs aggressive deletions on repeated character 
> sequences, even on numbers.
> This might be unexpected during full text search.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4063) FrenchLightStemmer performs abusive compression of (arbitrary) repeated characters in long tokens

2012-06-01 Thread Tanguy Moal (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287408#comment-13287408
 ] 

Tanguy Moal edited comment on LUCENE-4063 at 6/1/12 1:55 PM:
-

I agree with both of you, it sounds like a design change.

I think Jacques Savoy's algorithm was intended to be used on words. Not on 
numbers, or mixes of both (like in 22h00).

Which is true for any stemmer, I think. That's why on the mailing I also 
suggested we could have each stemmer share a common interface that would filter 
non-stemmable literals out of the way. That could prevent the same issue to 
rise from a different stemming implementation.

I'm just saying this as I think about it.

  was (Author: tanguy):
I agree with both of you, it sounds like a design change.

I think Jacques Savoy's algorithm was intended to be used on words. Not on 
numbers, or mixes of both (like in 22h00).

Which is true for any stemmer, I think. That's why on the mailing I also 
suggested we could have each stemmer share a common interface that would filter 
non-stemmable literals out of the way. That could prevent the same issue to 
raise from a different stemming implementation.

I'm just saying this as I think about it.
  
> FrenchLightStemmer performs abusive compression of (arbitrary) repeated 
> characters in long tokens
> -
>
> Key: LUCENE-4063
> URL: https://issues.apache.org/jira/browse/LUCENE-4063
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.4, 4.0
>Reporter: Tanguy Moal
>Assignee: Steven Rowe
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-4063.patch, SOLR-3463.patch, SOLR-3463.patch, 
> SOLR-3463.patch
>
>
> FrenchLightStemmer performs aggressive deletions on repeated character 
> sequences, even on numbers.
> This might be unexpected during full text search.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4099) Remove generics from SpatialStrategy and remove SpatialFieldInfo

2012-06-01 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287409#comment-13287409
 ] 

David Smiley commented on LUCENE-4099:
--

Although the SpatialStrategies have become fairly lightweight; I'd like to see 
if there is a path forward that retains re-using SpatialStrategies across 
requests, sans the API awkwardness of SpatialFieldInfo as it is is implemented 
today.  Perhaps a SpatialStrategy can be created per field (although it may 
technically use more than one field) and then there's no need for 
SpatialFieldInfo or anything like it.

> Remove generics from SpatialStrategy and remove SpatialFieldInfo
> 
>
> Key: LUCENE-4099
> URL: https://issues.apache.org/jira/browse/LUCENE-4099
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/spatial
>Reporter: Chris Male
>Priority: Minor
>
> Same time ago I added SpatialFieldInfo as a way for SpatialStrategys to 
> declare what information they needed per request.  This meant that a Strategy 
> could be used across multiple requests.  However it doesn't really need to be 
> that way any more, Strategies are light to instantiate and the generics are 
> just clumsy and annoying.
> Instead Strategies should just define what they need in their constructor. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4099) Remove generics from SpatialStrategy and remove SpatialFieldInfo

2012-06-01 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287413#comment-13287413
 ] 

Chris Male commented on LUCENE-4099:


That can definitely happen.  I guess instead of being instantiated per request, 
I mean they will be instantiated per configuration.  

> Remove generics from SpatialStrategy and remove SpatialFieldInfo
> 
>
> Key: LUCENE-4099
> URL: https://issues.apache.org/jira/browse/LUCENE-4099
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/spatial
>Reporter: Chris Male
>Priority: Minor
>
> Same time ago I added SpatialFieldInfo as a way for SpatialStrategys to 
> declare what information they needed per request.  This meant that a Strategy 
> could be used across multiple requests.  However it doesn't really need to be 
> that way any more, Strategies are light to instantiate and the generics are 
> just clumsy and annoying.
> Instead Strategies should just define what they need in their constructor. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3500) ERROR Unable to execute query

2012-06-01 Thread ourlight (JIRA)
ourlight created SOLR-3500:
--

 Summary: ERROR Unable to execute query
 Key: SOLR-3500
 URL: https://issues.apache.org/jira/browse/SOLR-3500
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 3.5
Reporter: ourlight
 Fix For: 3.5


I use many tables for indexing. 

During dataimport, I get errors for some tables like "Unable to execute query". 
But next time, when I try to dataimport for that table, I can do successfully 
without any error.

`
[Thread-17] ERROR o.a.s.h.d.EntityProcessorWrapper - Exception in entity : 
test_5:org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to execute query: 
SELECT Title, url, synonym, description FROM test_5 WHERE status in ('1','s')  
Processing Document # 11046

at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)

Caused by: java.sql.SQLException: ResultSet is from UPDATE. No Data.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:987)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:982)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927)
at com.mysql.jdbc.ResultSetImpl.next(ResultSetImpl.java:7152)
at com.mysql.jdbc.ConnectionImpl.loadServerVariables(ConnectionImpl.java:3870)
at 
com.mysql.jdbc.ConnectionImpl.initializePropsFromServer(ConnectionImpl.java:3407)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2384)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2153)
at com.mysql.jdbc.ConnectionImpl.(ConnectionImpl.java:792)
at com.mysql.jdbc.JDBC4Connection.(JDBC4Connection.java:47)
at sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:381)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:305)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:128)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:363)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:39)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:240)
... 11 more
`

I set the connettimeout, readtimeout, readonly=true, 
transactionIsolation="TRANSACTION_READ_COMMITTED", 
holdability="CLOSE_CURSORS_AT_COMMIT" in data-config.xml but I get same erros. 
What is this error? How Can I index all of my tables?



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4063) FrenchLightStemmer performs abusive compression of (arbitrary) repeated characters in long tokens

2012-06-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287447#comment-13287447
 ] 

Robert Muir commented on LUCENE-4063:
-

{quote}
That's why on the mailing I also suggested we could have each stemmer share a 
common interface that would filter non-stemmable literals out of the way
{quote}

We actually have this in place, but its too limited. Its called 
KeywordAttribute. When this is set, the stemmer will not touch the word.

Currently the only way to set this out of box is to use KeywordMarkerFilter 
which takes a Set of protected words.

But to make your idea more flexible: I could imagine a couple more filters:
* one that marks as Keyword based on a set of types. In this case you would 
just add NUM to that set, and no stemmers would touch any numbers. Of course
  for french this is solved already, but imagine if you are using the URLEmail 
tokenizer: I think a set like { URL, EMAIL } would be very useful,
  otherwise stemmers will probably muck with them.
* one that marks as Keyword based on a regular expression. This could be good 
for fine-tuning stemmers for a lot of general purpose needs: e.g. on the 
  mailing list before someone was unhappy about how russian stemmers would 
treat russian place names and they had a certain set of suffixes they didnt
  want stemmed.

Anyway, I would really like to see these filters, I think they would be pretty 
simple to implement as well. 

> FrenchLightStemmer performs abusive compression of (arbitrary) repeated 
> characters in long tokens
> -
>
> Key: LUCENE-4063
> URL: https://issues.apache.org/jira/browse/LUCENE-4063
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.4, 4.0
>Reporter: Tanguy Moal
>Assignee: Steven Rowe
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-4063.patch, SOLR-3463.patch, SOLR-3463.patch, 
> SOLR-3463.patch
>
>
> FrenchLightStemmer performs aggressive deletions on repeated character 
> sequences, even on numbers.
> This might be unexpected during full text search.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3500) ERROR Unable to execute query

2012-06-01 Thread ourlight (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ourlight updated SOLR-3500:
---

Description: 
I use many tables for indexing. 

During dataimport, I get errors for some tables like "Unable to execute query". 
But next time, when I try to dataimport for that table, I can do successfully 
without any error.


[Thread-17] ERROR o.a.s.h.d.EntityProcessorWrapper - Exception in 
entity : 
test_5:org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to execute query: 
SELECT Title, url, synonym, description FROM test_5 WHERE status in 
('1','s')  Processing Document # 11046

at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)

Caused by: java.sql.SQLException: ResultSet is from UPDATE. No Data.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:987)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:982)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927)
at com.mysql.jdbc.ResultSetImpl.next(ResultSetImpl.java:7152)
at 
com.mysql.jdbc.ConnectionImpl.loadServerVariables(ConnectionImpl.java:3870)
at 
com.mysql.jdbc.ConnectionImpl.initializePropsFromServer(ConnectionImpl.java:3407)
at 
com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2384)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2153)
at com.mysql.jdbc.ConnectionImpl.(ConnectionImpl.java:792)
at com.mysql.jdbc.JDBC4Connection.(JDBC4Connection.java:47)
at sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:381)
at 
com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:305)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:128)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:363)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:39)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:240)
... 11 more



I set the connettimeout, readtimeout, readonly=true, 
transactionIsolation="TRANSACTION_READ_COMMITTED", 
holdability="CLOSE_CURSORS_AT_COMMIT" in data-config.xml but I get same erros. 
What is this error? How Can I index all of my tables?



  was:
I use many tables for indexing. 

During dataimport, I get errors for some tables like "Unable to execute query". 
But next time, when I try to dataimport for that table, I can do successfully 
without any error.

`
[Thread-17] ERROR o.a.s.h.d.EntityProcessorWrapper - Exception in entity : 
test_5:org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to execute query: 
SELECT Title, url, synonym, description FROM test_5 WHERE status in ('1','s')  
Processing Document # 11046

at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
a

[jira] [Created] (SOLR-3501) solr, how can I make search query with fixed slop(distance)

2012-06-01 Thread ourlight (JIRA)
ourlight created SOLR-3501:
--

 Summary: solr, how can I make search query with fixed 
slop(distance)
 Key: SOLR-3501
 URL: https://issues.apache.org/jira/browse/SOLR-3501
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 3.5
Reporter: ourlight
 Fix For: 3.5


I want to search data within fixed slop in Solr.

For example, I make search query 'title:+solr +user ~2' for search some data 
which have 'solr' and 'user' within 2 slops. But it's not working in Solr. I 
get some parameter, defType=edismax, pf, qs, ps. It's not change the search 
result, but order.

If I use Phrase Query just like 'title:"solr user"~2', it can't get the result 
just like "... users for solr ..." which have not keywords in order.

How Can I do? Help me.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3500) ERROR Unable to execute query

2012-06-01 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287462#comment-13287462
 ] 

Jack Krupansky edited comment on SOLR-3500 at 6/1/12 2:52 PM:
--

Doesn't sound like a Solr bug (or new feature). Take it over to the solr-user 
mailing list for assistance.

  was (Author: jkrupan):
Doesn't sound like a Solr bug. Take it over to the solr-user mailing list 
for assistance.
  
> ERROR Unable to execute query
> -
>
> Key: SOLR-3500
> URL: https://issues.apache.org/jira/browse/SOLR-3500
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Affects Versions: 3.5
>Reporter: ourlight
> Fix For: 3.5
>
>
> I use many tables for indexing. 
> During dataimport, I get errors for some tables like "Unable to execute 
> query". But next time, when I try to dataimport for that table, I can do 
> successfully without any error.
>   [Thread-17] ERROR o.a.s.h.d.EntityProcessorWrapper - Exception in 
> entity : 
>   test_5:org.apache.solr.handler.dataimport.DataImportHandlerException: 
>   Unable to execute query: 
>   SELECT Title, url, synonym, description FROM test_5 WHERE status in 
> ('1','s')  Processing Document # 11046
>   at 
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
>   at 
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
>   at 
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
>   at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
>   at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
>   at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
>   at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
>   Caused by: java.sql.SQLException: ResultSet is from UPDATE. No Data.
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:987)
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:982)
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927)
>   at com.mysql.jdbc.ResultSetImpl.next(ResultSetImpl.java:7152)
>   at 
> com.mysql.jdbc.ConnectionImpl.loadServerVariables(ConnectionImpl.java:3870)
>   at 
> com.mysql.jdbc.ConnectionImpl.initializePropsFromServer(ConnectionImpl.java:3407)
>   at 
> com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2384)
>   at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2153)
>   at com.mysql.jdbc.ConnectionImpl.(ConnectionImpl.java:792)
>   at com.mysql.jdbc.JDBC4Connection.(JDBC4Connection.java:47)
>   at sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
>   at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:381)
>   at 
> com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:305)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:128)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:363)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:39)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:240)
>   ... 11 more
> I set the connettimeout, readtimeout, readonly=true, 
> transactionIsolation="TRANSACTION_READ_COMMITTED", 
> holdability="CLOSE_CURSORS_AT_COMMIT" in data-config.xml but I get same 
> erros. 
> What is this error? How Can I index all of my tables?

--
This message is

[jira] [Commented] (SOLR-3500) ERROR Unable to execute query

2012-06-01 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287462#comment-13287462
 ] 

Jack Krupansky commented on SOLR-3500:
--

Doesn't sound like a Solr bug. Take it over to the solr-user mailing list for 
assistance.

> ERROR Unable to execute query
> -
>
> Key: SOLR-3500
> URL: https://issues.apache.org/jira/browse/SOLR-3500
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Affects Versions: 3.5
>Reporter: ourlight
> Fix For: 3.5
>
>
> I use many tables for indexing. 
> During dataimport, I get errors for some tables like "Unable to execute 
> query". But next time, when I try to dataimport for that table, I can do 
> successfully without any error.
>   [Thread-17] ERROR o.a.s.h.d.EntityProcessorWrapper - Exception in 
> entity : 
>   test_5:org.apache.solr.handler.dataimport.DataImportHandlerException: 
>   Unable to execute query: 
>   SELECT Title, url, synonym, description FROM test_5 WHERE status in 
> ('1','s')  Processing Document # 11046
>   at 
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
>   at 
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
>   at 
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
>   at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
>   at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
>   at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
>   at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
>   Caused by: java.sql.SQLException: ResultSet is from UPDATE. No Data.
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:987)
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:982)
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927)
>   at com.mysql.jdbc.ResultSetImpl.next(ResultSetImpl.java:7152)
>   at 
> com.mysql.jdbc.ConnectionImpl.loadServerVariables(ConnectionImpl.java:3870)
>   at 
> com.mysql.jdbc.ConnectionImpl.initializePropsFromServer(ConnectionImpl.java:3407)
>   at 
> com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2384)
>   at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2153)
>   at com.mysql.jdbc.ConnectionImpl.(ConnectionImpl.java:792)
>   at com.mysql.jdbc.JDBC4Connection.(JDBC4Connection.java:47)
>   at sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
>   at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:381)
>   at 
> com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:305)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:128)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:363)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:39)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:240)
>   ... 11 more
> I set the connettimeout, readtimeout, readonly=true, 
> transactionIsolation="TRANSACTION_READ_COMMITTED", 
> holdability="CLOSE_CURSORS_AT_COMMIT" in data-config.xml but I get same 
> erros. 
> What is this error? How Can I index all of my tables?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information

[jira] [Created] (SOLR-3502) [Copy Field][Importing] Copy Field duplicate a field and a multivalued field can be created without having field multivalued in schema

2012-06-01 Thread Alexis Torres Paderewski (JIRA)
Alexis Torres Paderewski created SOLR-3502:
--

 Summary: [Copy Field][Importing] Copy Field duplicate a field and 
a multivalued field can be created without having field multivalued in schema
 Key: SOLR-3502
 URL: https://issues.apache.org/jira/browse/SOLR-3502
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
 Environment: two Solr 3.1 on linux.
Reporter: Alexis Torres Paderewski
Priority: Minor


we have two field on both solr:







We have a SolrDocument reader (solrj client without pojo binding) that takes 
all docs from one solr and SolrInputDocument write them on the other solr.

On B field in the target solr we ended with an Array containing the duplicate 
value we have on the first solr. How could solr internally break schema ?



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping

2012-06-01 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287482#comment-13287482
 ] 

Otis Gospodnetic commented on LUCENE-2357:
--

Woho, I love seeing old issues getting love like this! :)
Has anyone measured (or at least eyeballed) how much RAM this saves?


> Reduce transient RAM usage while merging by using packed ints array for docID 
> re-mapping
> 
>
> Key: LUCENE-2357
> URL: https://issues.apache.org/jira/browse/LUCENE-2357
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2357.patch, LUCENE-2357.patch, LUCENE-2357.patch, 
> LUCENE-2357.patch, LUCENE-2357.patch
>
>
> We allocate this int[] to remap docIDs due to compaction of deleted ones.
> This uses alot of RAM for large segment merges, and can fail to allocate due 
> to fragmentation on 32 bit JREs.
> Now that we have packed ints, a simple fix would be to use a packed int 
> array... and maybe instead of storing abs docID in the mapping, we could 
> store the number of del docs seen so far (so the remap would do a lookup then 
> a subtract).  This may add some CPU cost to merging but should bring down 
> transient RAM usage quite a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4098) Efficient bulk operations for packed integer arrays

2012-06-01 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-4098:
--

Assignee: Michael McCandless

> Efficient bulk operations for packed integer arrays
> ---
>
> Key: LUCENE-4098
> URL: https://issues.apache.org/jira/browse/LUCENE-4098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4098.patch
>
>
> There are some places in Lucene code that {iterate over,set} ranges of values 
> of a packed integer array. Because bit-packing implementations (Packed*) tend 
> be slower than direct implementations, this can take a lot of time.
> For example, under some scenarii, GrowableWriter can take most of its 
> (averaged) {{set}} time in resizing operations.
> However, some bit-packing schemes, such as the one that is used by 
> {{Packed64SingleBlock*}}, allow to implement efficient bulk operations such 
> as get/set/fill. Implementing these bulk operations in 
> {{PackedInts.{Reader,Mutable}}} and using them across other components 
> instead of their single-value counterpart could help improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 14493 - Failure

2012-06-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/14493/

1 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandler

Error Message:
ERROR: SolrIndexSearcher opens=74 closes=73

Stack Trace:
java.lang.AssertionError: ERROR: SolrIndexSearcher opens=74 closes=73
at __randomizedtesting.SeedInfo.seed([A0709F31D8E6B131]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:190)
at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:82)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:752)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 9947 lines...]
   [junit4]   2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
corePoolSize to: 0
   [junit4]   2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
maximumPoolSize to: 2147483647
   [junit4]   2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
maxThreadIdleTime to: 5
   [junit4]   2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
sizeOfQueue to: -1
   [junit4]   2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
fairnessPolicy to: false
   [junit4]   2> 127 T873 oascsi.HttpClientUtil.createClient Creating new http 
client, 
config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false
   [junit4]   2> 137 T873 oasc.CoreContainer.create Creating SolrCore 
'collection1' using instanceDir: 
./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/.
   [junit4]   2> 137 T873 oasc.SolrResourceLoader. new SolrResourceLoader 
for directory: 
'./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/./'
   [junit4]   2> 152 T873 oasc.SolrConfig. Using Lucene MatchVersion: 
LUCENE_50
   [junit4]   2> 176 T873 oasc.SolrConfig. Loaded SolrConfig: 
solrconfig.xml
   [junit4]   2> 177 T873 oass.IndexSchema.readSchema Reading Solr Schema
   [junit4]   2> 179 T873 oass.IndexSchema.readSchema Schema name=test
   [junit4]   2> 186 T873 oass.IndexSchema.readSchema WARNING no default search 
field specified in schema.
   [junit4]   2> 187 T873 oass.IndexSchema.readSchema unique key field: id
   [junit4]   2> 188 T873 oasc.SolrCore. [collection1] Opening new 
SolrCore at 
./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/./,
 
dataDir=./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/data/
   [junit4]   2> 188 T873 oasc.SolrCore. JMX monitoring not detected for 
core: collection1
   [junit4]   2> 189 T873 oasc.SolrCore.initIndex WARNING [collection1] Solr 
index directory 
'./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/data/index'
 doesn't exist. Creating new index...
   [junit4]   2> 192 T873 oasc.SolrDeletionPolicy.onCommit 
SolrDeletionPolicy.onCommit: commits:num=1
   [junit4]   2>
commit{dir=/usr/home/hudson/hudson-

Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 14493 - Failure

2012-06-01 Thread Uwe Schindler
Some problem with the regex!

Uwe
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de



Apache Jenkins Server  schrieb:

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/14493/

1 tests failed.
FAILED: junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandler

Error Message:
ERROR: SolrIndexSearcher opens=74 closes=73

Stack Trace:
java.lang.AssertionError: ERROR: SolrIndexSearcher opens=74 closes=73
at __randomizedtesting.SeedInfo.seed([A0709F31D8E6B131]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:190)
at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:82)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:752)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 9947 lines...]
[junit4] 2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
corePoolSize to: 0
[junit4] 2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
maximumPoolSize to: 2147483647
[junit4] 2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
maxThreadIdleTime to: 5
[junit4] 2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
sizeOfQueue to: -1
[junit4] 2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
fairnessPolicy to: false
[junit4] 2> 127 T873 oascsi.HttpClientUtil.createClient Creating new http 
client, 
config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false
[junit4] 2> 137 T873 oasc.CoreContainer.create Creating SolrCore 'collection1' 
using instanceDir: 
./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/.
[junit4] 2> 137 T873 oasc.SolrResourceLoader. new SolrResourceLoader for 
directory: 
'./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/./'
[junit4] 2> 152 T873 oasc.SolrConfig. Using Lucene MatchVersion: LUCENE_50
[junit4] 2> 176 T873 oasc.SolrConfig. Loaded SolrConfig: solrconfig.xml
[junit4] 2> 177 T873 oass.IndexSchema.readSchema Reading Solr Schema
[junit4] 2> 179 T873 oass.IndexSchema.readSchema Schema name=test
[junit4] 2> 186 T873 oass.IndexSchema.readSchema WARNING no default search 
field specified in schema.
[junit4] 2> 187 T873 oass.IndexSchema.readSchema unique key field: id
[junit4] 2> 188 T873 oasc.SolrCore. [collection1] Opening new SolrCore at 
./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/./,
 
dataDir=./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/data/
[junit4] 2> 188 T873 oasc.SolrCore. JMX monitoring not detected for core: 
collection1
[junit4] 2> 189 T873 oasc.SolrCore.initIndex WARNING [collection1] Solr index 
directory 
'./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/data/index'
 doesn't exist. Creating new index...
[junit4] 2> 192 T873 oasc.SolrDeletionPolicy.onCommit 
SolrDeletionPolicy.onCommit: commits:num=1
[junit4] 2>   

[jira] [Commented] (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping

2012-06-01 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287499#comment-13287499
 ] 

Adrien Grand commented on LUCENE-2357:
--

Hi Otis,

Before this change, each doc map used ~ {{maxDoc * 32}} bits while they now use 
~ {{maxDoc * lg(min(numDocs, numDeletedDocs))}} (where lg is the ceil of log in 
base 2) bits. So even in the worst case (numDocs = numDeleted = maxDoc / 2), 
the improvement is {{((31 - lg(maxDoc))/32}}. On a segment with 
maxDoc=1000, this is a 22% improvement. But the improvement is much better 
when the number of deleted documents is close to 0 or to maxDoc. For example, 
if your segment has maxDoc=1000 and numDeletedDocs=10, the improvement 
({{32 - lg(min(numDocs, numDeletedDocs))/32}}) is close to 50%. If 
numDeletedDocs=100, the improvement is close to 80%.

> Reduce transient RAM usage while merging by using packed ints array for docID 
> re-mapping
> 
>
> Key: LUCENE-2357
> URL: https://issues.apache.org/jira/browse/LUCENE-2357
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2357.patch, LUCENE-2357.patch, LUCENE-2357.patch, 
> LUCENE-2357.patch, LUCENE-2357.patch
>
>
> We allocate this int[] to remap docIDs due to compaction of deleted ones.
> This uses alot of RAM for large segment merges, and can fail to allocate due 
> to fragmentation on 32 bit JREs.
> Now that we have packed ints, a simple fix would be to use a packed int 
> array... and maybe instead of storing abs docID in the mapping, we could 
> store the number of del docs seen so far (so the remap would do a lookup then 
> a subtract).  This may add some CPU cost to merging but should bring down 
> transient RAM usage quite a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-3500) ERROR Unable to execute query

2012-06-01 Thread ourlight (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ourlight closed SOLR-3500.
--

Resolution: Fixed

Sorry, I move it to user help.

> ERROR Unable to execute query
> -
>
> Key: SOLR-3500
> URL: https://issues.apache.org/jira/browse/SOLR-3500
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Affects Versions: 3.5
>Reporter: ourlight
> Fix For: 3.5
>
>
> I use many tables for indexing. 
> During dataimport, I get errors for some tables like "Unable to execute 
> query". But next time, when I try to dataimport for that table, I can do 
> successfully without any error.
>   [Thread-17] ERROR o.a.s.h.d.EntityProcessorWrapper - Exception in 
> entity : 
>   test_5:org.apache.solr.handler.dataimport.DataImportHandlerException: 
>   Unable to execute query: 
>   SELECT Title, url, synonym, description FROM test_5 WHERE status in 
> ('1','s')  Processing Document # 11046
>   at 
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
>   at 
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
>   at 
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
>   at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
>   at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
>   at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
>   at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
>   Caused by: java.sql.SQLException: ResultSet is from UPDATE. No Data.
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:987)
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:982)
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927)
>   at com.mysql.jdbc.ResultSetImpl.next(ResultSetImpl.java:7152)
>   at 
> com.mysql.jdbc.ConnectionImpl.loadServerVariables(ConnectionImpl.java:3870)
>   at 
> com.mysql.jdbc.ConnectionImpl.initializePropsFromServer(ConnectionImpl.java:3407)
>   at 
> com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2384)
>   at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2153)
>   at com.mysql.jdbc.ConnectionImpl.(ConnectionImpl.java:792)
>   at com.mysql.jdbc.JDBC4Connection.(JDBC4Connection.java:47)
>   at sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
>   at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:381)
>   at 
> com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:305)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:128)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:363)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:39)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:240)
>   ... 11 more
> I set the connettimeout, readtimeout, readonly=true, 
> transactionIsolation="TRANSACTION_READ_COMMITTED", 
> holdability="CLOSE_CURSORS_AT_COMMIT" in data-config.xml but I get same 
> erros. 
> What is this error? How Can I index all of my tables?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---

[jira] [Closed] (SOLR-3501) solr, how can I make search query with fixed slop(distance)

2012-06-01 Thread ourlight (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ourlight closed SOLR-3501.
--

Resolution: Fixed

> solr, how can I make search query with fixed slop(distance)
> ---
>
> Key: SOLR-3501
> URL: https://issues.apache.org/jira/browse/SOLR-3501
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Affects Versions: 3.5
>Reporter: ourlight
> Fix For: 3.5
>
>
> I want to search data within fixed slop in Solr.
> For example, I make search query 'title:+solr +user ~2' for search some data 
> which have 'solr' and 'user' within 2 slops. But it's not working in Solr. I 
> get some parameter, defType=edismax, pf, qs, ps. It's not change the search 
> result, but order.
> If I use Phrase Query just like 'title:"solr user"~2', it can't get the 
> result just like "... users for solr ..." which have not keywords in order.
> How Can I do? Help me.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping

2012-06-01 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287508#comment-13287508
 ] 

Adrien Grand commented on LUCENE-2357:
--

This is the theoretical improvement. However, in order not to slow merging down 
too much, I instantiate the {{PackedInts.Mutable}} that holds the doc map with 
{{acceptableOverheadRatio=PackedInts.FAST=50%}} (see LUCENE-4062), so the 
actual improvement might be a little worse than the theoretical improvement. If 
you are more interested in memory usage than in merge speed, you could still 
reach the theoretical improvement by replacing {{PackedInts.FAST}} with 
{{PackedInts.COMPACT}} in {{MergeState.DocMap.build}}.

> Reduce transient RAM usage while merging by using packed ints array for docID 
> re-mapping
> 
>
> Key: LUCENE-2357
> URL: https://issues.apache.org/jira/browse/LUCENE-2357
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2357.patch, LUCENE-2357.patch, LUCENE-2357.patch, 
> LUCENE-2357.patch, LUCENE-2357.patch
>
>
> We allocate this int[] to remap docIDs due to compaction of deleted ones.
> This uses alot of RAM for large segment merges, and can fail to allocate due 
> to fragmentation on 32 bit JREs.
> Now that we have packed ints, a simple fix would be to use a packed int 
> array... and maybe instead of storing abs docID in the mapping, we could 
> store the number of del docs seen so far (so the remap would do a lookup then 
> a subtract).  This may add some CPU cost to merging but should bring down 
> transient RAM usage quite a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping

2012-06-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287510#comment-13287510
 ] 

Robert Muir commented on LUCENE-2357:
-

Do we have any high-level idea of what the performance cost of COMPACT vs FAST 
is for merging?
(e.g. typical case of Lucene40 codec). Is COMPACT maybe a good tradeoff?

> Reduce transient RAM usage while merging by using packed ints array for docID 
> re-mapping
> 
>
> Key: LUCENE-2357
> URL: https://issues.apache.org/jira/browse/LUCENE-2357
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2357.patch, LUCENE-2357.patch, LUCENE-2357.patch, 
> LUCENE-2357.patch, LUCENE-2357.patch
>
>
> We allocate this int[] to remap docIDs due to compaction of deleted ones.
> This uses alot of RAM for large segment merges, and can fail to allocate due 
> to fragmentation on 32 bit JREs.
> Now that we have packed ints, a simple fix would be to use a packed int 
> array... and maybe instead of storing abs docID in the mapping, we could 
> store the number of del docs seen so far (so the remap would do a lookup then 
> a subtract).  This may add some CPU cost to merging but should bring down 
> transient RAM usage quite a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 14493 - Failure

2012-06-01 Thread Steven A Rowe
Yeah there are four regexes now - only one seems problematic.  On mobile now - 
I'll fix when I get to a workstation later today.

Uwe Schindler  wrote:



Some problem with the regex!

Uwe
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de



Apache Jenkins Server  schrieb:

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/14493/

1 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandler

Error Message:
ERROR: SolrIndexSearcher opens=74 closes=73

Stack Trace:
java.lang.AssertionError: ERROR: SolrIndexSearcher opens=74 closes=73
at __randomizedtesting.SeedInfo.seed([A0709F31D8E6B131]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:190)
at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:82)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:752)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 9947 lines...]
   [junit4]   2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
corePoolSize to: 0
   [junit4]   2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
maximumPoolSize to: 2147483647
   [junit4]   2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
maxThreadIdleTime to: 5
   [junit4]   2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
sizeOfQueue to: -1
   [junit4]   2> 126 T873 oashc.HttpShardHandlerFactory.getParameter Setting 
fairnessPolicy to: false
   [junit4]   2> 127 T873 oascsi.HttpClientUtil.createClient Creating new http 
client, 
config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false
   [junit4]   2> 137 T873 oasc.CoreContainer.create Creating SolrCore 
'collection1' using instanceDir: 
./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/.
   [junit4
 ]
2> 137 T873 oasc.SolrResourceLoader. new SolrResourceLoader for 
directory: 
'./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/./'
   [junit4]   2> 152 T873 oasc.SolrConfig. Using Lucene MatchVersion: 
LUCENE_50
   [junit4]   2> 176 T873 oasc.SolrConfig. Loaded SolrConfig: 
solrconfig.xml
   [junit4]   2> 177 T873 oass.IndexSchema.readSchema Reading Solr Schema
   [junit4]   2> 179 T873 oass.IndexSchema.readSchema Schema name=test
   [junit4]   2> 186 T873 oass.IndexSchema.readSchema WARNING no default search 
field specified in schema.
   [junit4]   2> 187 T873 oass.IndexSchema.readSchema unique key field: id
   [junit4]   2> 188 T873 oasc.SolrCore. [collection1] Opening new 
SolrCore at 
./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/./,
dataDir=./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338565063406/slave/data/
   [junit4]   2> 188 T873 oasc.SolrCore. JMX monitoring not detected for 
core: collection1
   [junit4]   2> 189 T873 oasc.SolrCore.initIndex WARNING [colle

[jira] [Commented] (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping

2012-06-01 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287521#comment-13287521
 ] 

Adrien Grand commented on LUCENE-2357:
--

While working on LUCENE-4062, I had in mind that {{FAST}} (50%) would be ok for 
transient data structures while {{COMPACT}} (0%) and {{DEFAULT}} (20%) would be 
better for big and long-living structures depending on the performance 
requirements. However, it is true that the DocMap might not be the bottleneck 
for merging (especially since this operation involves disk accesses). I can try 
to run some benchmarks next week to find out whether {{COMPACT}} (or maybe 
{{DEFAULT}}) could be a better tradeoff.

> Reduce transient RAM usage while merging by using packed ints array for docID 
> re-mapping
> 
>
> Key: LUCENE-2357
> URL: https://issues.apache.org/jira/browse/LUCENE-2357
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2357.patch, LUCENE-2357.patch, LUCENE-2357.patch, 
> LUCENE-2357.patch, LUCENE-2357.patch
>
>
> We allocate this int[] to remap docIDs due to compaction of deleted ones.
> This uses alot of RAM for large segment merges, and can fail to allocate due 
> to fragmentation on 32 bit JREs.
> Now that we have packed ints, a simple fix would be to use a packed int 
> array... and maybe instead of storing abs docID in the mapping, we could 
> store the number of del docs seen so far (so the remap would do a lookup then 
> a subtract).  This may add some CPU cost to merging but should bring down 
> transient RAM usage quite a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4098) Efficient bulk operations for packed integer arrays

2012-06-01 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4098:
---

Attachment: LUCENE-4098.patch

The patch looks great!

I wrote a new random test (in attached patch) that just randomly bulk copies a 
bunch of slices from one packed ints array to another and then asserts the 
values are correct ... but it's failing.  I'm not sure why yet, and it's 
entirely possible it's a test bug!  If you run with -Dtests.verbose=true it 
prints details...

> Efficient bulk operations for packed integer arrays
> ---
>
> Key: LUCENE-4098
> URL: https://issues.apache.org/jira/browse/LUCENE-4098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4098.patch, LUCENE-4098.patch
>
>
> There are some places in Lucene code that {iterate over,set} ranges of values 
> of a packed integer array. Because bit-packing implementations (Packed*) tend 
> be slower than direct implementations, this can take a lot of time.
> For example, under some scenarii, GrowableWriter can take most of its 
> (averaged) {{set}} time in resizing operations.
> However, some bit-packing schemes, such as the one that is used by 
> {{Packed64SingleBlock*}}, allow to implement efficient bulk operations such 
> as get/set/fill. Implementing these bulk operations in 
> {{PackedInts.{Reader,Mutable}}} and using them across other components 
> instead of their single-value counterpart could help improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2012-06-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287563#comment-13287563
 ] 

Michael McCandless commented on LUCENE-2858:


Not entirely sure why but it looks like the commits here slowed down our NRT 
reopen latency.  If you look at the nightly bench graph: 
http://people.apache.org/~mikemccand/lucenebench/nrt.html and click + drag from 
Jan 2012 to today, annotation R shows we increased from ~46 msec NRT reopen 
latency to ~50 msec ... could just be hotspot being upset...

> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> --
>
> Key: LUCENE-2858
> URL: https://issues.apache.org/jira/browse/LUCENE-2858
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
> Fix For: 4.0
>
> Attachments: LUCENE-2858-FCinsanity.patch, 
> LUCENE-2858-FixSlowEnsureOpen.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get 
> back a DirectoryReader which is a composite reader. The interface of 
> IndexReader has now lots of methods that simply throw UOE (in fact more than 
> 50% of all methods that are commonly used ones are unuseable now). This 
> confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a 
> separate API. After that, you are no longer able, to get TermsEnum without 
> wrapping from those composite readers. We currently have helper classes for 
> wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
> Multi*), those should be retrofitted to implement the correct classes 
> (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
> reader as ctor param, maybe it could also simply take a List). 
> In my opinion, maybe composite readers could implement some collection APIs 
> and also have the ReaderUtil method directly built in (possibly as a "view" 
> in the util.Collection sense). In general composite readers do not really 
> need to look like the previous IndexReaders, they could simply be a 
> "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a 
> segment changes, you need a new atomic reader? - maybe because of deletions 
> thats not the best idea, but we should investigate. Maybe make the whole 
> reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4098) Efficient bulk operations for packed integer arrays

2012-06-01 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287581#comment-13287581
 ] 

Adrien Grand commented on LUCENE-4098:
--

The bulk get and set are not guaranteed to return/set exactly len longs (so 
that they can stay at a block boundary to make the subsequent reads/writes 
faster). So I think
{{code}}
int got = packed1.get(start, buffer, offset, len);
assertTrue(got <= len);
int sot = packed2.set(start, buffer, offset, len);
assertTrue(sot <= len);
{{code}}
should be replaced with
{{code}}
int got = packed1.get(start, buffer, offset, len);
assertTrue(got <= len);
int sot = packed2.set(start, buffer, offset, got);
assertTrue(sot <= len);
{{code}}

> Efficient bulk operations for packed integer arrays
> ---
>
> Key: LUCENE-4098
> URL: https://issues.apache.org/jira/browse/LUCENE-4098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4098.patch, LUCENE-4098.patch
>
>
> There are some places in Lucene code that {iterate over,set} ranges of values 
> of a packed integer array. Because bit-packing implementations (Packed*) tend 
> be slower than direct implementations, this can take a lot of time.
> For example, under some scenarii, GrowableWriter can take most of its 
> (averaged) {{set}} time in resizing operations.
> However, some bit-packing schemes, such as the one that is used by 
> {{Packed64SingleBlock*}}, allow to implement efficient bulk operations such 
> as get/set/fill. Implementing these bulk operations in 
> {{PackedInts.{Reader,Mutable}}} and using them across other components 
> instead of their single-value counterpart could help improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4098) Efficient bulk operations for packed integer arrays

2012-06-01 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287581#comment-13287581
 ] 

Adrien Grand edited comment on LUCENE-4098 at 6/1/12 6:24 PM:
--

The bulk get and set are not guaranteed to return/set exactly len longs (so 
that they can stay at a block boundary to make the subsequent reads/writes 
faster). So I think
{code}
int got = packed1.get(start, buffer, offset, len);
assertTrue(got <= len);
int sot = packed2.set(start, buffer, offset, len);
assertTrue(sot <= len);
{code}
should be replaced with
{code}
int got = packed1.get(start, buffer, offset, len);
assertTrue(got <= len);
int sot = packed2.set(start, buffer, offset, got);
assertTrue(sot <= got);
{code}

  was (Author: jpountz):
The bulk get and set are not guaranteed to return/set exactly len longs (so 
that they can stay at a block boundary to make the subsequent reads/writes 
faster). So I think
{code}
int got = packed1.get(start, buffer, offset, len);
assertTrue(got <= len);
int sot = packed2.set(start, buffer, offset, len);
assertTrue(sot <= len);
{code}
should be replaced with
{code}
int got = packed1.get(start, buffer, offset, len);
assertTrue(got <= len);
int sot = packed2.set(start, buffer, offset, got);
assertTrue(sot <= len);
{code}
  
> Efficient bulk operations for packed integer arrays
> ---
>
> Key: LUCENE-4098
> URL: https://issues.apache.org/jira/browse/LUCENE-4098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4098.patch, LUCENE-4098.patch
>
>
> There are some places in Lucene code that {iterate over,set} ranges of values 
> of a packed integer array. Because bit-packing implementations (Packed*) tend 
> be slower than direct implementations, this can take a lot of time.
> For example, under some scenarii, GrowableWriter can take most of its 
> (averaged) {{set}} time in resizing operations.
> However, some bit-packing schemes, such as the one that is used by 
> {{Packed64SingleBlock*}}, allow to implement efficient bulk operations such 
> as get/set/fill. Implementing these bulk operations in 
> {{PackedInts.{Reader,Mutable}}} and using them across other components 
> instead of their single-value counterpart could help improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4098) Efficient bulk operations for packed integer arrays

2012-06-01 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287581#comment-13287581
 ] 

Adrien Grand edited comment on LUCENE-4098 at 6/1/12 6:23 PM:
--

The bulk get and set are not guaranteed to return/set exactly len longs (so 
that they can stay at a block boundary to make the subsequent reads/writes 
faster). So I think
{code}
int got = packed1.get(start, buffer, offset, len);
assertTrue(got <= len);
int sot = packed2.set(start, buffer, offset, len);
assertTrue(sot <= len);
{code}
should be replaced with
{code}
int got = packed1.get(start, buffer, offset, len);
assertTrue(got <= len);
int sot = packed2.set(start, buffer, offset, got);
assertTrue(sot <= len);
{code}

  was (Author: jpountz):
The bulk get and set are not guaranteed to return/set exactly len longs (so 
that they can stay at a block boundary to make the subsequent reads/writes 
faster). So I think
{{code}}
int got = packed1.get(start, buffer, offset, len);
assertTrue(got <= len);
int sot = packed2.set(start, buffer, offset, len);
assertTrue(sot <= len);
{{code}}
should be replaced with
{{code}}
int got = packed1.get(start, buffer, offset, len);
assertTrue(got <= len);
int sot = packed2.set(start, buffer, offset, got);
assertTrue(sot <= len);
{{code}}
  
> Efficient bulk operations for packed integer arrays
> ---
>
> Key: LUCENE-4098
> URL: https://issues.apache.org/jira/browse/LUCENE-4098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4098.patch, LUCENE-4098.patch
>
>
> There are some places in Lucene code that {iterate over,set} ranges of values 
> of a packed integer array. Because bit-packing implementations (Packed*) tend 
> be slower than direct implementations, this can take a lot of time.
> For example, under some scenarii, GrowableWriter can take most of its 
> (averaged) {{set}} time in resizing operations.
> However, some bit-packing schemes, such as the one that is used by 
> {{Packed64SingleBlock*}}, allow to implement efficient bulk operations such 
> as get/set/fill. Implementing these bulk operations in 
> {{PackedInts.{Reader,Mutable}}} and using them across other components 
> instead of their single-value counterpart could help improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4098) Efficient bulk operations for packed integer arrays

2012-06-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287584#comment-13287584
 ] 

Michael McCandless commented on LUCENE-4098:


Aha, phew!  A bug in the test ... with that change the test looks like it's 
passing so far ... I'll beast it for a while.  Thanks Adrien.

> Efficient bulk operations for packed integer arrays
> ---
>
> Key: LUCENE-4098
> URL: https://issues.apache.org/jira/browse/LUCENE-4098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4098.patch, LUCENE-4098.patch
>
>
> There are some places in Lucene code that {iterate over,set} ranges of values 
> of a packed integer array. Because bit-packing implementations (Packed*) tend 
> be slower than direct implementations, this can take a lot of time.
> For example, under some scenarii, GrowableWriter can take most of its 
> (averaged) {{set}} time in resizing operations.
> However, some bit-packing schemes, such as the one that is used by 
> {{Packed64SingleBlock*}}, allow to implement efficient bulk operations such 
> as get/set/fill. Implementing these bulk operations in 
> {{PackedInts.{Reader,Mutable}}} and using them across other components 
> instead of their single-value counterpart could help improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4098) Efficient bulk operations for packed integer arrays

2012-06-01 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4098:
---

Attachment: LUCENE-4098.patch

Patch, w/ the fix for the new random test (now passes).

I also tweaked the random test to sometimes use the PackedInts.copy method.

Finally, I changed PackedInts.copy to alloc a long[] of size min(capacity, len) 
so we don't over-allocate if the incoming mem is larger than it needs to be.

I think it's ready!  Thanks Adrien.

> Efficient bulk operations for packed integer arrays
> ---
>
> Key: LUCENE-4098
> URL: https://issues.apache.org/jira/browse/LUCENE-4098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Adrien Grand
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4098.patch, LUCENE-4098.patch, LUCENE-4098.patch
>
>
> There are some places in Lucene code that {iterate over,set} ranges of values 
> of a packed integer array. Because bit-packing implementations (Packed*) tend 
> be slower than direct implementations, this can take a lot of time.
> For example, under some scenarii, GrowableWriter can take most of its 
> (averaged) {{set}} time in resizing operations.
> However, some bit-packing schemes, such as the one that is used by 
> {{Packed64SingleBlock*}}, allow to implement efficient bulk operations such 
> as get/set/fill. Implementing these bulk operations in 
> {{PackedInts.{Reader,Mutable}}} and using them across other components 
> instead of their single-value counterpart could help improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4092) Check what's Jenkins pattern for e-mailing log fragments (so that it includes failures).

2012-06-01 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287621#comment-13287621
 ] 

Steven Rowe commented on LUCENE-4092:
-

This one had syntax problems (a recent test failure notification email 
complained about it):

{noformat}
${BUILD_LOG_REGEX,regex="[ 
\\t]*\\[echo\\].*)*\\s*[1-9]\\d*\\s+Unknown\\s+Licenses.*",linesBefore=17,linesAfter=20}
{noformat}

I switched it to the following on all non-Maven Jenkins job configuratinos:

{noformat}
${BUILD_LOG_REGEX,regex="[ 
\\t]*\\[echo\\]\\s+[1-9]\\d*\\s+Unknown\\s+Licenses.*",linesBefore=17,linesAfter=20}
{noformat}

> Check what's Jenkins pattern for e-mailing log fragments (so that it includes 
> failures).
> 
>
> Key: LUCENE-4092
> URL: https://issues.apache.org/jira/browse/LUCENE-4092
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: general/test
>Reporter: Dawid Weiss
>Priority: Trivial
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4092) Check what's Jenkins pattern for e-mailing log fragments (so that it includes failures).

2012-06-01 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287627#comment-13287627
 ] 

Steven Rowe commented on LUCENE-4092:
-

{quote}
Spreading the BUILD_LOG_REGEX regex value over multiple lines is not supported 
by Jenkins's email templating functionality 
[...]
This could be fixed by allowing line terminators to be escaped:
[...]
I submitted a Jenkins JIRA issue for this: 
[https://issues.jenkins-ci.org/browse/JENKINS-13976].
{quote}

I forked the email-ext project on github and made a pull request, which has now 
been incorporated into the master.


> Check what's Jenkins pattern for e-mailing log fragments (so that it includes 
> failures).
> 
>
> Key: LUCENE-4092
> URL: https://issues.apache.org/jira/browse/LUCENE-4092
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: general/test
>Reporter: Dawid Weiss
>Priority: Trivial
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2993) Integrate WordBreakSpellChecker with Solr

2012-06-01 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer reassigned SOLR-2993:


Assignee: James Dyer

> Integrate WordBreakSpellChecker with Solr
> -
>
> Key: SOLR-2993
> URL: https://issues.apache.org/jira/browse/SOLR-2993
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 4.0
>Reporter: James Dyer
>Assignee: James Dyer
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-2993.patch, SOLR-2993.patch, SOLR-2993.patch
>
>
> A SpellCheckComponent enhancement, leveraging the WordBreakSpellChecker from 
> LUCENE-3523:
> - Detect spelling errors resulting from misplaced whitespace without the use 
> of shingle-based dictionaries.  
> - Seamlessly integrate word-break suggestions with single-word spelling 
> corrections from the existing FileBased-, IndexBased- or Direct- spell 
> checkers.  
> - Provide collation support for word-break errors including cases where the 
> user has a mix of single-word spelling errors and word-break errors in the 
> same query.  
> - Provide shard support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2993) Integrate WordBreakSpellChecker with Solr

2012-06-01 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2993:
-

Attachment: SOLR-2993.patch

Here is a new patch that can better handle collations involving mixed 
required/prohibited/optional terms and also boolean operators (AND/OR/NOT).

When combining words, we do not want to combine an optional term with a 
prohibited one, etc.  We also do not want to combine words that belong to 
different boolean clauses or those that were "NOT"ed to one another.

Likewise, when splitting a term into multiples, we want to ensure all the 
resulting terms are required if the original one was required, etc.  Also, if 
the query contains boolean operators (AND/OR/NOT), this version ANDs the split 
terms together.

In the case of Boolean operators, SpellingQueryConverter can only make a guess 
as to the best action.  It doesn't know the actual query parser used, the 
default "q.op" or "mm" setting, etc.  All this does is make a reasonable guess 
as to the best way to re-write the query if corrections involved combining 
and/or splitting words.

See WordBreakSpellCheckerTest#testCollate and 
SpellingQueryConverterTest#testRequiredOrProhibitedFlags for examples of how 
this works.

Unless there are other issues, I plan to commit this in a few days.

> Integrate WordBreakSpellChecker with Solr
> -
>
> Key: SOLR-2993
> URL: https://issues.apache.org/jira/browse/SOLR-2993
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 4.0
>Reporter: James Dyer
>Assignee: James Dyer
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-2993.patch, SOLR-2993.patch, SOLR-2993.patch, 
> SOLR-2993.patch
>
>
> A SpellCheckComponent enhancement, leveraging the WordBreakSpellChecker from 
> LUCENE-3523:
> - Detect spelling errors resulting from misplaced whitespace without the use 
> of shingle-based dictionaries.  
> - Seamlessly integrate word-break suggestions with single-word spelling 
> corrections from the existing FileBased-, IndexBased- or Direct- spell 
> checkers.  
> - Provide collation support for word-break errors including cases where the 
> user has a mix of single-word spelling errors and word-break errors in the 
> same query.  
> - Provide shard support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3477) SOLR does not start up when no cores are defined

2012-06-01 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe updated SOLR-3477:


Attachment: SOLR-3477-3_6.patch
SOLR-3477.patch

I checked this and saw the same as Tommaso, it seems to work on trunk and 4x 
branch. 
I added a test case that start the CoreContainer with no cores (a solr.xml file 
with an empty list of cores). It works on trunk and fails on 3.6 with an 
exception as the one described in the description of this issue.

> SOLR does not start up when no cores are defined
> 
>
> Key: SOLR-3477
> URL: https://issues.apache.org/jira/browse/SOLR-3477
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.6
> Environment: All environments
>Reporter: Sebastian Schaffert
>Priority: Critical
> Attachments: SOLR-3477-3_6.patch, SOLR-3477.patch
>
>
> Since version 3.6.0, Solr does not start up when no cores are defined in 
> solr.xml. The problematic code is in CoresContainer.java, lines 171-173.
> org.apache.solr.common.SolrException: No cores were created, please check the 
> logs for errors
>   at 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:172)
>  ~[solr-core-3.6.0.jar:3.6.0 1310449 - rmuir - 2012-04-06 11:34:38]
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) 
> ~[solr-core-3.6.0.jar:3.6.0 1310449 - rmuir - 2012-04-06 11:34:38]
> ...
> In our case, this is however a valid situation, because we create the cores 
> programatically by calling the webservices to register new cores. The server 
> is initially started with no cores defined, and depending on the 
> configuration of our application, cores are then created dynamically.
> For the time being, we have to stick with version 3.5, which did not have 
> this problem (or feature).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 14500 - Failure

2012-06-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/14500/

1 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandler

Error Message:
ERROR: SolrIndexSearcher opens=74 closes=73

Stack Trace:
java.lang.AssertionError: ERROR: SolrIndexSearcher opens=74 closes=73
at __randomizedtesting.SeedInfo.seed([39E592AB06F158EA]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:190)
at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:82)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:752)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 9946 lines...]
   [junit4]   2> 133 T1722 oashc.HttpShardHandlerFactory.getParameter Setting 
corePoolSize to: 0
   [junit4]   2> 133 T1722 oashc.HttpShardHandlerFactory.getParameter Setting 
maximumPoolSize to: 2147483647
   [junit4]   2> 133 T1722 oashc.HttpShardHandlerFactory.getParameter Setting 
maxThreadIdleTime to: 5
   [junit4]   2> 133 T1722 oashc.HttpShardHandlerFactory.getParameter Setting 
sizeOfQueue to: -1
   [junit4]   2> 133 T1722 oashc.HttpShardHandlerFactory.getParameter Setting 
fairnessPolicy to: false
   [junit4]   2> 134 T1722 oascsi.HttpClientUtil.createClient Creating new http 
client, 
config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false
   [junit4]   2> 144 T1722 oasc.CoreContainer.create Creating SolrCore 
'collection1' using instanceDir: 
./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338587201357/slave/.
   [junit4]   2> 144 T1722 oasc.SolrResourceLoader. new 
SolrResourceLoader for directory: 
'./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338587201357/slave/./'
   [junit4]   2> 159 T1722 oasc.SolrConfig. Using Lucene MatchVersion: 
LUCENE_50
   [junit4]   2> 191 T1722 oasc.SolrConfig. Loaded SolrConfig: 
solrconfig.xml
   [junit4]   2> 192 T1722 oass.IndexSchema.readSchema Reading Solr Schema
   [junit4]   2> 194 T1722 oass.IndexSchema.readSchema Schema name=test
   [junit4]   2> 203 T1722 oass.IndexSchema.readSchema WARNING no default 
search field specified in schema.
   [junit4]   2> 204 T1722 oass.IndexSchema.readSchema unique key field: id
   [junit4]   2> 205 T1722 oasc.SolrCore. [collection1] Opening new 
SolrCore at 
./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338587201357/slave/./,
 
dataDir=./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338587201357/slave/data/
   [junit4]   2> 205 T1722 oasc.SolrCore. JMX monitoring not detected for 
core: collection1
   [junit4]   2> 206 T1722 oasc.SolrCore.initIndex WARNING [collection1] Solr 
index directory 
'./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1338587201357/slave/data/index'
 doesn't exist. Creating new index...
   [junit4]   2> 219 T1722 oasc.SolrDeletionPolicy.onCommit 
SolrDeletionPolicy.onCommit: commits:num=1
   [junit4]   2>
commit{dir=/usr/h

[jira] [Resolved] (SOLR-2796) AddUpdateCommand.getIndexedId doesn't work with schema configured defaults/copyField - UUIDField/copyField can not be used as uniqueKey field

2012-06-01 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2796.


Resolution: Fixed
  Assignee: Hoss Man

Committed revision 1345376. - trunk
Committed revision 1345378. - 4x

Committed checking for these situations in IndexSchema along with explicit 
error messages.   Commit also includes a CHANGES.txt upgrading not about using 
UUIDUpdateProcessorFactory to have uniqueKey values generated automatically, 
note will need to be updated once copy-field-esque update processor is 
available (tracked in SOLR-2599)

> AddUpdateCommand.getIndexedId doesn't work with schema configured 
> defaults/copyField - UUIDField/copyField can not be used as uniqueKey field
> -
>
> Key: SOLR-2796
> URL: https://issues.apache.org/jira/browse/SOLR-2796
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Hoss Man
>Priority: Blocker
> Fix For: 4.0
>
> Attachments: SOLR-2796.patch
>
>
> in Solr 1.4, and the HEAD of the 3x branch, the UUIDField can be used as the 
> uniqueKey field even if documents do not specify a value by taking advantage 
> of the {{default="NEW"}} feature of UUIDField.
> Similarly, a copyField can be used to populate the uniqueKey field with data 
> from some field with another name -- multiple copyFields can even be used if 
> there is no overlap (ie: if you have two differnet types of documents with no 
> overlap in their id space, you can copy from companyId->id and from 
> productId->id and use "id" as your uniqueKey field in solr)
> Neither of these approaches work in Solr trunk because of how 
> {{AddUpdateCommand.getIndexedId}} is currently used by the 
> DirectUpdateHander2 (see 
> [r1152500|http://svn.apache.org/viewvc?view=revision&revision=1152500]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3349) Required field cannot be satisfied by copyField

2012-06-01 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-3349.


Resolution: Fixed
  Assignee: Hoss Man

> Required field cannot be satisfied by copyField
> ---
>
> Key: SOLR-3349
> URL: https://issues.apache.org/jira/browse/SOLR-3349
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.0
>Reporter: Benson Margulies
>Assignee: Hoss Man
>
> While trying to  diagnose another problem, I tried the following pair of 
> elements in my schema.xml:
> {noformat}
>  id
>  
> {noformat}
> I can't insert documents; I get:
> org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey 
> field: id

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org