Re: subtle Solr classloader problem

2013-05-26 Thread Robert Muir
This is not a bug. It's a broken config.
On May 26, 2013 11:51 AM, "Shawn Heisey"  wrote:
>
> While looking into SOLR-4852 and testing every conceivable lib
> permutation, I ran across a second problem, I'd like to know if it
> should be considered a bug.
>
>
https://issues.apache.org/jira/browse/SOLR-4852?focusedCommentId=13667025#comment-13667025
>
> What I was trying to do here was split my required jars between
> ${solr.solr.home}/lib and ${solr.solr.home}/foo ... the former directory
> is automatically used for libraries, the latter was added by
> sharedLib="foo" in my solr.xml.  Should this be a valid configuration?
> If not, perhaps we need to stop automatically including
> ${solr.solr.home}/lib.
>
> I run into the same problem (unable to find the ICUTokenizer class)
> whenever I split my jars, even though the icu analysis jar was not the
> jar that I moved.  When I first tried it, I moved the icu4j jar, but it
> also has the exact same problem when I move the mysql jar, which has
> nothing at all to do with ICU.
>
> Here's a Solr log (on an unpatched branch_4x) from when I moved the
> mysql jar from lib to foo.  You can see the jars that get loaded, so
> this should not be happening:
>
> http://apaste.info/6aK5
>
> If all the jars are in either lib or foo, everything works.
>
> Is this behavior a bug?  I am starting to think that this problem and
> the original SOLR-4852 issue are actually the same problem, and that it
> may not be a duplicate jar problem, but rather something specific and
> subtle with the ICU analysis components that happens when the
> classloader is replaced.
>
> Thanks,
> Shawn
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
t


Re: SLF4J Binding Warnings

2013-05-26 Thread Robert Muir
I don't care what they say: of course as a producer of logging jars they
would encourage their use! There is a clear bias here! Meanwhile I just
laugh when software has like five or six jars yet prints don't work.
On May 26, 2013 11:14 PM, "David Smiley (@MITRE.org)" 
wrote:

> Interesting.  But the sysout-over-slf4j project declares:
>
>
> > The sysout-over-slf4j module is explicitly not intended to encourage the
> > use of System.out or System.err for logging purposes. There is a
> > significant performance overhead attached to its use, and as such it
> > should be considered a stop-gap for your own code until you can alter it
> > to use SLF4J directly, or a work-around for poorly behaving third party
> > modules.
>
> As far as Solr is concerned, SLF4J is good, IMO.  Adapters are available to
> log to basically anything, and the user is in control of that by providing
> their logging jar of choice.
>
> ~ David
>
>
> Robert Muir wrote
> > On Thu, May 23, 2013 at 12:29 PM, Shawn Heisey <
>
> > solr@
>
> > > wrote:
> >
> >>
> >> For logs that are in test code itself, using sysout or syserr is
> probably
> >> a good option.  The Solr code that is being tested will (in most cases)
> >> pull in a dependency on slf4j because Logger is ubiquitous.  That's what
> >> I
> >> was referring to.
> >>
> >>
> > I'm not sure it has to forever. For example, in trunk we could decide to
> > use jetty's logging class instead, so solr has no hard dependency on
> slf4j
> > at all.
> > If its in the classpath it would get used, but otherwise stuff just goes
> > to
> > System.err.println.
> >
> > Or solr could just use System.err.println, and if someone wants logging
> > they can redirect it (e.g.
> > http://projects.lidalia.org.uk/sysout-over-slf4j/
> > ).
> >
> > Lots of possibilities to remove logging jars!
>
>
>
>
>
> -
>  Author:
> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SLF4J-Binding-Warnings-tp4064166p4066182.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.6.0) - Build # 495 - Still Failing!

2013-05-26 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/495/
Java: 64bit/jdk1.6.0 -XX:+UseCompressedOops -XX:+UseSerialGC

1 tests failed.
FAILED:  org.apache.lucene.replicator.http.HttpReplicatorTest.testBasic

Error Message:
Connection to http://localhost:51879 refused

Stack Trace:
org.apache.http.conn.HttpHostConnectException: Connection to 
http://localhost:51879 refused
at 
__randomizedtesting.SeedInfo.seed([5B0D4A4BFB403453:F0F7575E249CB27D]:0)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.lucene.replicator.http.HttpClientBase.executeGET(HttpClientBase.java:178)
at 
org.apache.lucene.replicator.http.HttpReplicator.checkForUpdate(HttpReplicator.java:51)
at 
org.apache.lucene.replicator.ReplicationClient.doUpdate(ReplicationClient.java:196)
at 
org.apache.lucene.replicator.ReplicationClient.updateNow(ReplicationClient.java:402)
at 
org.apache.lucene.replicator.http.HttpReplicatorTest.testBasic(HttpReplicatorTest.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.a

Re: Numeric Multi-Valued Doc Values

2013-05-26 Thread David Smiley (@MITRE.org)
That's a schema issue.  Lucene doesn't really have one so there isn't a
definitive answer there.  For Solr, this ideally should be cleaner but it
isn't, last I checked a month ago.  You could poke around the TrieField code
but in the end you will probably end up making assumptions on your code
being consistent with what TrieField is doing :-/  Perhaps we need a new
JIRA issue to add an indexedToObject(BytesRef):Object method to FieldType. 
The default impl could call indexedToReadable() and return a String.

~ David


Steven Bower wrote
> What is the proper way to convert the value coming from SortedSetDocValues
> for a numeric field (Int/Long/Float/Double) to their actual numeric
> values... Basically I can get the ByteRef filled in with lookupOrd(...)
> but
> what is the proper way to take the contents of the BytesRef and get the
> int/long/etc.. value back from it?
> 
> thanks,
> 
> steve





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Numeric-Multi-Valued-Doc-Values-tp4065423p4066190.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Numeric Multi-Valued Doc Values

2013-05-26 Thread David Smiley (@MITRE.org)




-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Numeric-Multi-Valued-Doc-Values-tp4065423p4066189.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5014) ANTLR Lucene query parser

2013-05-26 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667456#comment-13667456
 ] 

David Smiley commented on LUCENE-5014:
--

Interesting.  Just read your description but didn't look at the 2.5MB patch 
file ;-)

At Lucene Revolution I saw [a cool 
presentation|http://www.lucenerevolution.org/sites/default/files/Implementing%20a%20Custom%20Search%20Syntax%20using%20Solr%2C%20Lucene%20%26%20Parboiled.pdf]
 by [~berryman] that showed off using 
[Parboiled|https://github.com/sirthias/parboiled/wiki], which uses a new 
innovative approach to making a parser.  I was quite impressed by how easy to 
use it was vs the classic incumbents (specifically ANTLR).  I am curious what 
you think, in relation to your aims in this patch.

> ANTLR Lucene query parser
> -
>
> Key: LUCENE-5014
> URL: https://issues.apache.org/jira/browse/LUCENE-5014
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser, modules/queryparser
>Affects Versions: 4.3
> Environment: all
>Reporter: Roman Chyla
>  Labels: antlr, query, queryparser
> Attachments: LUCENE-5014.txt, LUCENE-5014.txt
>
>
> I would like to propose a new way of building query parsers for Lucene.  
> Currently, most Lucene parsers are hard to extend because they are either 
> written in Java (ie. the SOLR query parser, or edismax) or the parsing logic 
> is 'married' with the query building logic (i.e. the standard lucene parser, 
> generated by JavaCC) - which makes any extension really hard.
> Few years back, Lucene got the contrib/modern query parser (later renamed to 
> 'flexible'), yet that parser didn't become a star (it must be very confusing 
> for many users). However, that parsing framework is very powerful! And it is 
> a real pity that there aren't more parsers already using it - because it 
> allows us to add/extend/change almost any aspect of the query parsing. 
> So, if we combine ANTLR + queryparser.flexible, we can get very powerful 
> framework for building almost any query language one can think of. And I hope 
> this extension can become useful.
> The details:
>  - every new query syntax is written in EBNF, it lives in separate files (and 
> can be tested/developed independently - using 'gunit')
>  - ANTLR parser generates parsing code (and it can generate parsers in 
> several languages, the main target is Java, but it can also do Python - which 
> may be interesting for pylucene)
>  - the parser generates AST (abstract syntax tree) which is consumed by a  
> 'pipeline' of processors, users can easily modify this pipeline to add a 
> desired functionality
>  - the new parser contains a few (very important) debugging functions; it can 
> print results of every stage of the build, generate AST's as graphical 
> charts; ant targets help to build/test/debug grammars
>  - I've tried to reuse the existing queryparser.flexible components as much 
> as possible, only adding new processors when necessary
> Assumptions about the grammar:
>  - every grammar must have one top parse rule called 'mainQ'
>  - parsers must generate AST (Abstract Syntax Tree)
> The structure of the AST is left open, there are components which make 
> assumptions about the shape of the AST (ie. that MODIFIER is parent of a a 
> FIELD) however users are free to choose/write different processors with 
> different assumptions about the AST shape.
> More documentation on how to use the parser can be seen here:
> http://29min.wordpress.com/category/antlrqueryparser/
> The parser has been created more than one year back and is used in production 
> (http://labs.adsabs.harvard.edu/adsabs/). A different dialects of query 
> languages (with proximity operatos, functions, special logic etc) - can be 
> seen here: 
> https://github.com/romanchyla/montysolr/tree/master/contrib/adsabs
> https://github.com/romanchyla/montysolr/tree/master/contrib/invenio

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4774) Solr support Lucene Facets

2013-05-26 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667448#comment-13667448
 ] 

David Smiley commented on SOLR-4774:


The addition of hierarchical faceting is certainly a new feature users would 
appreciate.

I wonder how the performance compares to Solr's approach for equivalent 
features, e.g. typical field-value faceting.  The Lucene faceting module does 
more index time preparation to make faceting faster than doing it dynamically, 
and Solr's approach is roughly equivalent to Lucene's notion of dynamic 
faceting.  I learned these things thanks to [~mikemccand]'s blog.  But who 
knows how things will play out in practice.

> Solr support Lucene Facets
> --
>
> Key: SOLR-4774
> URL: https://issues.apache.org/jira/browse/SOLR-4774
> Project: Solr
>  Issue Type: New Feature
>Reporter: Bill Bell
>
> Since facets are now included in Lucene... 
> 1. Solr schema taxonomy glue
> 2. Switch query results to use this glue with a new param like 
> facet.lucene=true?
> Seems like a great enhancement !

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: SLF4J Binding Warnings

2013-05-26 Thread David Smiley (@MITRE.org)
Interesting.  But the sysout-over-slf4j project declares:


> The sysout-over-slf4j module is explicitly not intended to encourage the
> use of System.out or System.err for logging purposes. There is a
> significant performance overhead attached to its use, and as such it
> should be considered a stop-gap for your own code until you can alter it
> to use SLF4J directly, or a work-around for poorly behaving third party
> modules.

As far as Solr is concerned, SLF4J is good, IMO.  Adapters are available to
log to basically anything, and the user is in control of that by providing
their logging jar of choice.

~ David


Robert Muir wrote
> On Thu, May 23, 2013 at 12:29 PM, Shawn Heisey <

> solr@

> > wrote:
> 
>>
>> For logs that are in test code itself, using sysout or syserr is probably
>> a good option.  The Solr code that is being tested will (in most cases)
>> pull in a dependency on slf4j because Logger is ubiquitous.  That's what
>> I
>> was referring to.
>>
>>
> I'm not sure it has to forever. For example, in trunk we could decide to
> use jetty's logging class instead, so solr has no hard dependency on slf4j
> at all.
> If its in the classpath it would get used, but otherwise stuff just goes
> to
> System.err.println.
> 
> Or solr could just use System.err.println, and if someone wants logging
> they can redirect it (e.g.
> http://projects.lidalia.org.uk/sysout-over-slf4j/
> ).
> 
> Lots of possibilities to remove logging jars!





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SLF4J-Binding-Warnings-tp4064166p4066182.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4861) Simple reflected cross site scripting vulnerability

2013-05-26 Thread John Menerick (JIRA)
John Menerick created SOLR-4861:
---

 Summary: Simple reflected cross site scripting vulnerability
 Key: SOLR-4861
 URL: https://issues.apache.org/jira/browse/SOLR-4861
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.3, 4.2
 Environment: Requires web ui / Jetty Solr to be exploited.
Reporter: John Menerick


There exists a simple XSS via the 404 Jetty / Solr code.  Within 
JettySolrRunner.java, line 465, if someone asks for a non-existent page / url 
which contains malicious code, the "Can not find" can be escaped and malicious 
code will be executed on the victim's browser. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Join Queries Scores broken?

2013-05-26 Thread David Smiley (@MITRE.org)
Hi Kranti,

I think this post belongs on the solr-user list.  Any way, Solr join queries
don't score.  Filter queries don't score either so even if join queries did,
using them in a filter query wouldn't help.  For what it's worth, I
implemented a Solr scoring join query for a customer by basing it off of
scoring join query code in Lucene's "join" module.  You could do the same. 
My requirements were more extensive but if all I needed was a working
scoring join query, then there isn't that much to it.

Cheers,
  David


Krantiā„¢ K K Parisa wrote
> Hi,
> 
> I am trying to score/rank the results based on the boosting values
> specified with the Join queries.
> 
> Example:
> http://localhost:8983/solr/masterCore/select?q=a*&fq=(({!join
> fromIndex=childCore1
> from=parentId to=id v=$subQ1}) OR ({!join fromIndex=childCore2
> from=authorId to=id
> v=$subQ2}))&subQ1=(name:xyz^100)&subQ2=(author:xyz)&fl=id,score
> 
> So the matching documents from first join query should be getting higher
> score than the ones from the second join query.
> 
> But it is not working as expected, do I need to specify any other
> parameters? if it's an issue, shall I create a Jira ticket?
> Thanks & Regards,
> Kranti K Parisa
> http://www.linkedin.com/in/krantiparisa





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-Queries-Scores-broken-tp4065827p4066180.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Securing Lucene indexes

2013-05-26 Thread SUJIT PAL
Hi Lance,

I did a proof of concept where I stored the main document encrypted in a 
MongoDB database and the index contains the unstored versions of the data. I 
built a Solr component that would be set up as the last-component that took the 
docIds and queried the MongoDB database to build the documents and render them. 
I know of at least one installation where they do something very similar.

-sujit

On May 26, 2013, at 5:59 PM, Lance Norskog wrote:

> I would like to store Lucene indexes in an encrypted format. The only 
> security requirement is that if an intruder copies files from the file 
> system, no file will have raw data. It is acceptable for raw data to be 
> visible in raw disk scans. All I want to do is encrypt the readable index 
> files. 
> 
> Here is one way to encrypt Lucene indexes: encrypt the entire file on disk 
> and store the decrypted version in memory. This is ok with a RAMdirectory, 
> but does not scale. Using a little-known feature of Posix, it is possible to 
> create a memory-mapped file with a raw copy of the data which cannot be found 
> from the file system. The Posix feature is that when you open a file and then 
> delete it, the file still exists in the file system but is not visible 
> through the file system. The data exists as an invisible file in the file 
> system, and the file is deleted when you close the file descriptor. (This 
> does not work on Windows.) Let's call this a 'ghost file'. 
> 
> If memory-mapping works with ghost files, this seems like it should work: a 
> new Directory class will create a file and immediately delete it, then 
> memory-map it. The memory-mapped file will stay allocated inside the JVM 
> until the JVM closes the associated Directory object. The Directory class 
> would create an entire 'ghost Lucene index'.
> 
> This sequence opens an index:
> * open encrypted segment file in memory-mapped format
> * create ghost memory-mapped file
> * decrypt from encrypted memory into ghost file memory
> * close the encrypted index file
> Directory.close() wipes the ghost file data, closes the ghost file,  and the 
> file system reclaims the disk space.
> 
> This sequence creates an index:
> Directory.createOutput makes a ghost file and a real file.
> All data is saved to the ghost file.
> Close on the file encrypts the ghost file data into the real file, and wipes 
> the ghost data.
> Both files are then closed.
> 
> One glaring flaw is: what if close() is not called? The raw data will still 
> exist in the free disk space.
> There are two cases where this would happen:
> 1) the user fails to call close() but the program finishes normally. This can 
> be countered by adding a finalize() method that makes sure to clear the 
> memory.
> 2) the JVM fails and shutdown code is not run. The freed ghost data is on the 
> hard disk in the free disk space. It can only be found by scanning the raw 
> disks. One counter to this is to run the app in a virtual machine which does 
> not have access to the raw disk drivers. 
> 
> Is this a workable design? Are there any quirks of the Directory abstraction 
> that make this impossible or pointless? Or quirks in memory-mapped files or 
> how the JVM implements them?
> 
> Thanks for your time,
> 
> Lance Norskog
> 
> 
> 
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Securing Lucene indexes

2013-05-26 Thread Lance Norskog
I would like to store Lucene indexes in an encrypted format. The only
security requirement is that if an intruder copies files from the file
system, no file will have raw data. It is acceptable for raw data to be
visible in raw disk scans. All I want to do is encrypt the readable index
files.

Here is one way to encrypt Lucene indexes: encrypt the entire file on disk
and store the decrypted version in memory. This is ok with a RAMdirectory,
but does not scale. Using a little-known feature of Posix, it is possible
to create a memory-mapped file with a raw copy of the data which cannot be
found from the file system. The Posix feature is that when you open a file
and then delete it, the file still exists in the file system but is not
visible through the file system. The data exists as an invisible file in
the file system, and the file is deleted when you close the file
descriptor. (This does not work on Windows.) Let's call this a 'ghost
file'.

If memory-mapping works with ghost files, this seems like it should work: a
new Directory class will create a file and immediately delete it, then
memory-map it. The memory-mapped file will stay allocated inside the JVM
until the JVM closes the associated Directory object. The Directory class
would create an entire 'ghost Lucene index'.

This sequence opens an index:
* open encrypted segment file in memory-mapped format
* create ghost memory-mapped file
* decrypt from encrypted memory into ghost file memory
* close the encrypted index file
Directory.close() wipes the ghost file data, closes the ghost file,  and
the file system reclaims the disk space.

This sequence creates an index:
Directory.createOutput makes a ghost file and a real file.
All data is saved to the ghost file.
Close on the file encrypts the ghost file data into the real file, and
wipes the ghost data.
Both files are then closed.

One glaring flaw is: what if close() is not called? The raw data will still
exist in the free disk space.
There are two cases where this would happen:
1) the user fails to call close() but the program finishes normally. This
can be countered by adding a finalize() method that makes sure to clear the
memory.
2) the JVM fails and shutdown code is not run. The freed ghost data is on
the hard disk in the free disk space. It can only be found by scanning the
raw disks. One counter to this is to run the app in a virtual machine which
does not have access to the raw disk drivers.

Is this a workable design? Are there any quirks of the Directory
abstraction that make this impossible or pointless? Or quirks in
memory-mapped files or how the JVM implements them?

Thanks for your time,

Lance Norskog






-- 
Lance Norskog
goks...@gmail.com


[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.6.0) - Build # 494 - Still Failing!

2013-05-26 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/494/
Java: 64bit/jdk1.6.0 -XX:-UseCompressedOops -XX:+UseParallelGC

1 tests failed.
REGRESSION:  org.apache.lucene.replicator.http.HttpReplicatorTest.testBasic

Error Message:
Connection to http://localhost:51867 refused

Stack Trace:
org.apache.http.conn.HttpHostConnectException: Connection to 
http://localhost:51867 refused
at 
__randomizedtesting.SeedInfo.seed([5CB057324DBDD82A:F74A4A2792615E04]:0)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.lucene.replicator.http.HttpClientBase.executeGET(HttpClientBase.java:178)
at 
org.apache.lucene.replicator.http.HttpReplicator.checkForUpdate(HttpReplicator.java:51)
at 
org.apache.lucene.replicator.ReplicationClient.doUpdate(ReplicationClient.java:196)
at 
org.apache.lucene.replicator.ReplicationClient.updateNow(ReplicationClient.java:402)
at 
org.apache.lucene.replicator.http.HttpReplicatorTest.testBasic(HttpReplicatorTest.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 

[jira] [Commented] (LUCENE-5012) Make graph-based TokenFilters easier

2013-05-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667419#comment-13667419
 ] 

Michael McCandless commented on LUCENE-5012:


I committed some changes:

  * Got charFilter working

  * Fixed a few bugs in SynFilterStage not clearing it's state on end
/ reset

  * Created a new fun stage: InsertDeletedPunctuationStage.  This
stage detects when a tokenizer has skipped over punctuation chars
and inserts a deleted token representing the punctuation, e.g. to
prevent a synonym or phrase query from matching over the
punctuation.  I had previously thought we would need to modify
Tokenizers to do this but now I think maybe this Stage could do it
for any Tokenizer ...


> Make graph-based TokenFilters easier
> 
>
> Key: LUCENE-5012
> URL: https://issues.apache.org/jira/browse/LUCENE-5012
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-5012.patch
>
>
> SynonymFilter has two limitations today:
>   * It cannot create positions, so eg dns -> domain name service
> creates blatantly wrong highlights (SOLR-3390, LUCENE-4499 and
> others).
>   * It cannot consume a graph, so e.g. if you try to apply synonyms
> after Kuromoji tokenizer I'm not sure what will happen.
> I've thought about how to fix these issues but it's really quite
> difficult with the current PosInc/PosLen graph representation, so I'd
> like to explore an alternative approach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5012) Make graph-based TokenFilters easier

2013-05-26 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667418#comment-13667418
 ] 

Commit Tag Bot commented on LUCENE-5012:


[lucene5012 commit] mikemccand
http://svn.apache.org/viewvc?view=revision&revision=1486483

LUCENE-5012: add CharFilter, fix some bugs with SynFilter, add new 
InsertDeletedPunctuationStage

> Make graph-based TokenFilters easier
> 
>
> Key: LUCENE-5012
> URL: https://issues.apache.org/jira/browse/LUCENE-5012
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-5012.patch
>
>
> SynonymFilter has two limitations today:
>   * It cannot create positions, so eg dns -> domain name service
> creates blatantly wrong highlights (SOLR-3390, LUCENE-4499 and
> others).
>   * It cannot consume a graph, so e.g. if you try to apply synonyms
> after Kuromoji tokenizer I'm not sure what will happen.
> I've thought about how to fix these issues but it's really quite
> difficult with the current PosInc/PosLen graph representation, so I'd
> like to explore an alternative approach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #339: POMs out of sync

2013-05-26 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/339/

4 tests failed.
REGRESSION:  
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch

Error Message:
There are still nodes recoverying - waited for 230 seconds

Stack Trace:
java.lang.AssertionError: There are still nodes recoverying - waited for 230 
seconds
at 
__randomizedtesting.SeedInfo.seed([93CD86577A2B799C:122B084F0D7419A0]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:173)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:131)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:126)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testCollectionsAPI(CollectionsAPIDistributedZkTest.java:512)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:146)


FAILED:  
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.org.apache.solr.cloud.CollectionsAPIDistributedZkTest

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest: 1) Thread[id=5467, 
name=recoveryCmdExecutor-3163-thread-1, state=RUNNABLE, 
group=TGRP-CollectionsAPIDistributedZkTest] at 
java.net.PlainSocketImpl.socketConnect(Native Method) at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
 at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)  
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at 
java.net.Socket.connect(Socket.java:546) at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
 at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
 at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
 at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
 at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:298) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:679)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.cloud.CollectionsAPIDistributedZkTest: 
   1) Thread[id=5467, name=recoveryCmdExecutor-3163-thread-1, state=RUNNABLE, 
group=TGRP-CollectionsAPIDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrSer

[jira] [Commented] (SOLR-4816) Add "directUpdate" capability to CloudSolrServer

2013-05-26 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667400#comment-13667400
 ] 

Joel Bernstein commented on SOLR-4816:
--

The main backwards compatibility issue that I see is the compound response. 
This patch returns a response that contains the responses from each of the 
shard requests. This is the main reason that I made the directUpdate 
functionality optional.

The exception handling seems to be backwards compatible.

A separate implementation makes sense too. The CloudSolrServer works fine for 
less demanding indexing needs. The ConcurrentCloudSolrServer could be used for 
higher throughput.

  

> Add "directUpdate" capability to CloudSolrServer
> 
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch
>
>
> This issue changes CloudSolrServer so it can optionally route update requests 
> to the correct shard. This would be a nice feature to have to eliminate the 
> document routing overhead on the Solr servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4860) MoreLikeThisHandler doesn't work with numeric or date fields in 4.x

2013-05-26 Thread Thomas Seidl (JIRA)
Thomas Seidl created SOLR-4860:
--

 Summary: MoreLikeThisHandler doesn't work with numeric or date 
fields in 4.x
 Key: SOLR-4860
 URL: https://issues.apache.org/jira/browse/SOLR-4860
 Project: Solr
  Issue Type: Bug
  Components: MoreLikeThis
Affects Versions: 4.2
Reporter: Thomas Seidl


After upgrading to Solr 4.2 (from 3.x), I realized that my MLT queries no 
longer work. It happens if I pass an integer ({{solr.TrieIntField}}), float 
({{solr.TrieFloatField}}) or date ({{solr.DateField}}) field as part of the 
{{mlt.fl}} parameter. The field's {{multiValued}} setting doesn't seem to 
matter.

This is the error I get:
{{NumericTokenStream does not support CharTermAttribute.

java.lang.IllegalArgumentException: NumericTokenStream does not support 
CharTermAttribute.
at 
org.apache.lucene.analysis.NumericTokenStream$NumericAttributeFactory.createAttributeInstance(NumericTokenStream.java:136)
at 
org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:271)
at 
org.apache.lucene.queries.mlt.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:781)
at 
org.apache.lucene.queries.mlt.MoreLikeThis.retrieveTerms(MoreLikeThis.java:724)
at 
org.apache.lucene.queries.mlt.MoreLikeThis.like(MoreLikeThis.java:578)
at 
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:348)
at 
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:167)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:679)
}}

The configuration I use can be found here: 
[http://drupalcode.org/project/search_api_solr.git/tree/HEAD:/solr-conf/4.x]

If I just misconfigured something, then sorry and please tell me what I'd need 
to change. Any help would be appreciated!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, 

[jira] [Updated] (SOLR-4860) MoreLikeThisHandler doesn't work with numeric or date fields in 4.x

2013-05-26 Thread Thomas Seidl (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Seidl updated SOLR-4860:
---

Description: 
After upgrading to Solr 4.2 (from 3.x), I realized that my MLT queries no 
longer work. It happens if I pass an integer ({{solr.TrieIntField}}), float 
({{solr.TrieFloatField}}) or date ({{solr.DateField}}) field as part of the 
{{mlt.fl}} parameter. The field's {{multiValued}} setting doesn't seem to 
matter.

This is the error I get:
{noformat}
NumericTokenStream does not support CharTermAttribute.

java.lang.IllegalArgumentException: NumericTokenStream does not support 
CharTermAttribute.
at 
org.apache.lucene.analysis.NumericTokenStream$NumericAttributeFactory.createAttributeInstance(NumericTokenStream.java:136)
at 
org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:271)
at 
org.apache.lucene.queries.mlt.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:781)
at 
org.apache.lucene.queries.mlt.MoreLikeThis.retrieveTerms(MoreLikeThis.java:724)
at 
org.apache.lucene.queries.mlt.MoreLikeThis.like(MoreLikeThis.java:578)
at 
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:348)
at 
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:167)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:679)
{noformat}

The configuration I use can be found here: 
[http://drupalcode.org/project/search_api_solr.git/tree/HEAD:/solr-conf/4.x]

If I just misconfigured something, then sorry and please tell me what I'd need 
to change. Any help would be appreciated!

  was:
After upgrading to Solr 4.2 (from 3.x), I realized that my MLT queries no 
longer work. It happens if I pass an integer ({{solr.TrieIntField}}), float 
({{solr.TrieFloatField}}) or date ({{solr.DateField}}) field as part of the 
{{mlt.fl}} parameter. The field's {{multiValued}} setting doesn't seem to 
matter.

This is the error I get:
{{NumericTokenStream does not support CharTermAttribute.

java.lang.IllegalArgumentException: NumericTokenStream does not support 
CharTermAttribute.
at 
org.apache.lucene.analysis.NumericTo

subtle Solr classloader problem

2013-05-26 Thread Shawn Heisey
While looking into SOLR-4852 and testing every conceivable lib
permutation, I ran across a second problem, I'd like to know if it
should be considered a bug.

https://issues.apache.org/jira/browse/SOLR-4852?focusedCommentId=13667025#comment-13667025

What I was trying to do here was split my required jars between
${solr.solr.home}/lib and ${solr.solr.home}/foo ... the former directory
is automatically used for libraries, the latter was added by
sharedLib="foo" in my solr.xml.  Should this be a valid configuration?
If not, perhaps we need to stop automatically including
${solr.solr.home}/lib.

I run into the same problem (unable to find the ICUTokenizer class)
whenever I split my jars, even though the icu analysis jar was not the
jar that I moved.  When I first tried it, I moved the icu4j jar, but it
also has the exact same problem when I move the mysql jar, which has
nothing at all to do with ICU.

Here's a Solr log (on an unpatched branch_4x) from when I moved the
mysql jar from lib to foo.  You can see the jars that get loaded, so
this should not be happening:

http://apaste.info/6aK5

If all the jars are in either lib or foo, everything works.

Is this behavior a bug?  I am starting to think that this problem and
the original SOLR-4852 issue are actually the same problem, and that it
may not be a duplicate jar problem, but rather something specific and
subtle with the ICU analysis components that happens when the
classloader is replaced.

Thanks,
Shawn

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4852) If sharedLib is set to lib, classloader fails to find classes in lib

2013-05-26 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-4852:
---

Attachment: SOLR-4852-test-failhard.txt

When I use RuntimeException to fail hard because of a duplicate URL, two 
existing Solr tests fail.

I'm beginning to think that this is a problem specific to the ICU analysis 
components, something subtle that happens when the classloader is replaced.

> If sharedLib is set to lib, classloader fails to find classes in lib
> 
>
> Key: SOLR-4852
> URL: https://issues.apache.org/jira/browse/SOLR-4852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.4
> Environment: Linux bigindy5 2.6.32-358.6.1.el6.centos.plus.x86_64 #1 
> SMP Wed Apr 24 03:21:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
> java version "1.7.0_21"
> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>Reporter: Shawn Heisey
> Fix For: 5.0, 4.4
>
> Attachments: SOLR-4852.patch, SOLR-4852.patch, 
> SOLR-4852-test-failhard.txt
>
>
> I have some jars in the lib directory under solr.solr.home - DIH, ICU, and 
> MySQL.  If I set sharedLib in solr.xml to "lib" then the ICUTokenizer class 
> is not found, even though the jar is loaded (twice) during Solr startup.  If 
> I set sharedLib to another location that doesn't exist, the jars are only 
> loaded once and there is no problem.
> I'm using the old-style solr.xml on branch_4x revision 1485566.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4859) MinFieldValueUpdateProcessorFactory and MaxFieldValueUpdateProcessorFactory don't do numeric comparison for numeric fields

2013-05-26 Thread Jack Krupansky (JIRA)
Jack Krupansky created SOLR-4859:


 Summary: MinFieldValueUpdateProcessorFactory and 
MaxFieldValueUpdateProcessorFactory don't do numeric comparison for numeric 
fields
 Key: SOLR-4859
 URL: https://issues.apache.org/jira/browse/SOLR-4859
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.3
Reporter: Jack Krupansky


MinFieldValueUpdateProcessorFactory and MaxFieldValueUpdateProcessorFactory are 
advertised as supporting numeric comparisons, but this doesn't work - only 
string comparison is available - and doesn't seem possible, although the unit 
tests show it is possible at the unit test level.

The problem is that numeric processing is dependent on the SolrInputDocument 
containing a list of numeric values, but at least with both the current XML and 
JSON loaders, only string values can be loaded.

Test scenario.

1. Use Solr 4.3 example.
2. Add following update processor chain to solrconfig:

{code}
  

  sizes_i



  
{code}

3. Perform this update request:

{code}
  curl 
"http://localhost:8983/solr/update?commit=true&update.chain=max-only-num"; \
  -H 'Content-type:application/json' -d '
  [{"id": "doc-1",
"title_s": "Hello World",
"sizes_i": [200, 999, 101, 199, 1000]}]'
{code}

Note that the values are JSON integer values.

4. Perform this query:

{code}
curl "http://localhost:8983/solr/select/?q=*:*&indent=true&wt=json";
{code}

Shows this result:

{code}
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"doc-1",
"title_s":"Hello World",
"sizes_i":999,
"_version_":1436094187405574144}]
  }}
{code}

sizes_i should be 1000, not 999.

Alternative update tests:

{code}
  curl 
"http://localhost:8983/solr/update?commit=true&update.chain=max-only-num"; \
  -H 'Content-type:application/json' -d '
  [{"id": "doc-1",
"title_s": "Hello World",
"sizes_i": 200,
"sizes_i": 999,
"sizes_i": 101,
"sizes_i": 199,
"sizes_i": 1000}]'
{code}

and

{code}
  curl 
"http://localhost:8983/solr/update?commit=true&update.chain=max-only-num"; \
  -H 'Content-type:application/xml' -d '
  

  doc-1
  Hello World
  42
  128
  -3

  '
{code}

In XML, of course, there is no way for the input values to be anything other 
than strings ("text".)

The JSON loader does parse the values with their type, but immediately converts 
the values to strings:

{code}
private Object parseSingleFieldValue(int ev) throws IOException {
  switch (ev) {
case JSONParser.STRING:
  return parser.getString();
case JSONParser.LONG:
case JSONParser.NUMBER:
case JSONParser.BIGNUMBER:
  return parser.getNumberChars().toString();
case JSONParser.BOOLEAN:
  return Boolean.toString(parser.getBoolean()); // for legacy reasons, 
single values s are expected to be strings
case JSONParser.NULL:
  parser.getNull();
  return null;
case JSONParser.ARRAY_START:
  return parseArrayFieldValue(ev);
default:
  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Error 
parsing JSON field value. Unexpected "+JSONParser.getEventString(ev) );
  }
}

private List parseArrayFieldValue(int ev) throws IOException {
  assert ev == JSONParser.ARRAY_START;
  
  ArrayList lst = new ArrayList(2);
  for (;;) {
ev = parser.nextEvent();
if (ev == JSONParser.ARRAY_END) {
  return lst;
}
Object val = parseSingleFieldValue(ev);
lst.add(val);
  }
}
  }
{code}

Originally, I had hoped/expected that the schema type of the field would 
determine the type of min/max comparison - integer for a *_i field in my case.

The comparison logic for min:

{code}
public final class MinFieldValueUpdateProcessorFactory extends 
FieldValueSubsetUpdateProcessorFactory {

  @Override
  @SuppressWarnings("unchecked")
  public Collection pickSubset(Collection values) {
Collection result = values;
try {
  result = Collections.singletonList
(Collections.min(values));
} catch (ClassCastException e) {
  throw new SolrException
(BAD_REQUEST, 
 "Field values are not mutually comparable: " + e.getMessage(), e);
}
return result;
  }
{code}

Which seems to be completely dependent only on the type of the input values, 
not the field type itself.

It would be nice to at least have a comparison override: compareNumeric="true".


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-

[jira] [Updated] (LUCENE-4258) Incremental Field Updates through Stacked Segments

2013-05-26 Thread Sivan Yogev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sivan Yogev updated LUCENE-4258:


Attachment: LUCENE-4258.branch.6.patch

New patch, all tests pass. After adding a test which mixed updates and deletes 
I realized that mixing should be constrained, since it opens the door to 
complex scenarios and will cause changes to the way deletes are saved and 
that's not a good idea. So I put in IndexWriter a flag which marks whether 
there are deletes pending, so when a fields update is added and the flag is set 
there is an automatic commit. So in FrozenBufferedDeletes we can be sure that 
deletes are to be applied after updates.
I guess my implementation is not safe, Shai & Mike can you please take a look?

> Incremental Field Updates through Stacked Segments
> --
>
> Key: LUCENE-4258
> URL: https://issues.apache.org/jira/browse/LUCENE-4258
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Sivan Yogev
> Fix For: 4.4
>
> Attachments: IncrementalFieldUpdates.odp, 
> LUCENE-4258-API-changes.patch, LUCENE-4258.branch.1.patch, 
> LUCENE-4258.branch.2.patch, LUCENE-4258.branch3.patch, 
> LUCENE-4258.branch.4.patch, LUCENE-4258.branch.5.patch, 
> LUCENE-4258.branch.6.patch, LUCENE-4258.r1410593.patch, 
> LUCENE-4258.r1412262.patch, LUCENE-4258.r1416438.patch, 
> LUCENE-4258.r1416617.patch, LUCENE-4258.r1422495.patch, 
> LUCENE-4258.r1423010.patch
>
>   Original Estimate: 2,520h
>  Remaining Estimate: 2,520h
>
> Shai and I would like to start working on the proposal to Incremental Field 
> Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.6.0) - Build # 492 - Still Failing!

2013-05-26 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/492/
Java: 64bit/jdk1.6.0 -XX:+UseCompressedOops -XX:+UseParallelGC

1 tests failed.
FAILED:  org.apache.lucene.replicator.http.HttpReplicatorTest.testBasic

Error Message:
Connection to http://localhost:51339 refused

Stack Trace:
org.apache.http.conn.HttpHostConnectException: Connection to 
http://localhost:51339 refused
at 
__randomizedtesting.SeedInfo.seed([342A32D8269798F6:9FD02FCDF94B1ED8]:0)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.lucene.replicator.http.HttpClientBase.executeGET(HttpClientBase.java:178)
at 
org.apache.lucene.replicator.http.HttpReplicator.checkForUpdate(HttpReplicator.java:51)
at 
org.apache.lucene.replicator.ReplicationClient.doUpdate(ReplicationClient.java:196)
at 
org.apache.lucene.replicator.ReplicationClient.updateNow(ReplicationClient.java:402)
at 
org.apache.lucene.replicator.http.HttpReplicatorTest.testBasic(HttpReplicatorTest.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org

[jira] [Commented] (SOLR-4852) If sharedLib is set to lib, classloader fails to find classes in lib

2013-05-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667245#comment-13667245
 ] 

Robert Muir commented on SOLR-4852:
---

-1 to this lenient patch. If someone has a .configuration error, fail hard. 

> If sharedLib is set to lib, classloader fails to find classes in lib
> 
>
> Key: SOLR-4852
> URL: https://issues.apache.org/jira/browse/SOLR-4852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.4
> Environment: Linux bigindy5 2.6.32-358.6.1.el6.centos.plus.x86_64 #1 
> SMP Wed Apr 24 03:21:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
> java version "1.7.0_21"
> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>Reporter: Shawn Heisey
> Fix For: 5.0, 4.4
>
> Attachments: SOLR-4852.patch, SOLR-4852.patch
>
>
> I have some jars in the lib directory under solr.solr.home - DIH, ICU, and 
> MySQL.  If I set sharedLib in solr.xml to "lib" then the ICUTokenizer class 
> is not found, even though the jar is loaded (twice) during Solr startup.  If 
> I set sharedLib to another location that doesn't exist, the jars are only 
> loaded once and there is no problem.
> I'm using the old-style solr.xml on branch_4x revision 1485566.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5015) Unexpected performance difference between SamplingAccumulator and StandardFacetAccumulator

2013-05-26 Thread Gilad Barkai (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilad Barkai updated LUCENE-5015:
-

Attachment: LUCENE-5015.patch

Shai, I think you're right, a null {{SampleFixer}} makes more sense. 

While working on a test which validates that a flow works with the {{null}} 
fixer, I found it it did not. The reason is Complements. By default the 
complements kicks in when enough results are found. I think this may hold the 
key to the performance differences as well.

Rod, could you please try the following code and report the results?

{code}
SamplingAccumulator accumulator = new SamplingAccumulator( new 
RandomSampler(),  facetSearchParams, searcher.getIndexReader, taxo);

// Make sure no complements are in action

accumulator.setComplementThreshold(StandardFacetsAccumulator.DISABLE_COMPLEMENT);

facetsCollector = FacetsCollector.create(accumulator);

{code}

For the mean time, made the changes to the patch, and added the test for 
{{null}} fixer.

> Unexpected performance difference between SamplingAccumulator and 
> StandardFacetAccumulator
> --
>
> Key: LUCENE-5015
> URL: https://issues.apache.org/jira/browse/LUCENE-5015
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: 4.3
>Reporter: Rob Audenaerde
>Assignee: Shai Erera
>Priority: Minor
> Attachments: LUCENE-5015.patch, LUCENE-5015.patch, LUCENE-5015.patch, 
> LUCENE-5015.patch, LUCENE-5015.patch
>
>
> I have an unexpected performance difference between the SamplingAccumulator 
> and the StandardFacetAccumulator. 
> The case is an index with about 5M documents and each document containing 
> about 10 fields. I created a facet on each of those fields. When searching to 
> retrieve facet-counts (using 1 CountFacetRequest), the SamplingAccumulator is 
> about twice as fast as the StandardFacetAccumulator. This is expected and a 
> nice speed-up. 
> However, when I use more CountFacetRequests to retrieve facet-counts for more 
> than one field, the speeds of the SampingAccumulator decreases, to the point 
> where the StandardFacetAccumulator is faster. 
> {noformat} 
> FacetRequests  SamplingStandard
>  1   391 ms 1100 ms
>  2   531 ms 1095 ms 
>  3   948 ms 1108 ms
>  4  1400 ms 1110 ms
>  5  1901 ms 1102 ms
> {noformat} 
> Is this behaviour normal? I did not expect it, as the SamplingAccumulator 
> needs to do less work? 
> Some code to show what I do:
> {code}
>   searcher.search( facetsQuery, facetsCollector );
>   final List collectedFacets = 
> facetsCollector.getFacetResults();
> {code}
> {code}
> final FacetSearchParams facetSearchParams = new FacetSearchParams( 
> facetRequests );
> FacetsCollector facetsCollector;
> if ( isSampled )
> {
>   facetsCollector =
>   FacetsCollector.create( new SamplingAccumulator( new 
> RandomSampler(), facetSearchParams, searcher.getIndexReader(), taxo ) );
> }
> else
> {
>   facetsCollector = FacetsCollector.create( FacetsAccumulator.create( 
> facetSearchParams, searcher.getIndexReader(), taxo ) );
> {code}
>   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4744) Version conflict error during shard split test

2013-05-26 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-4744:


Attachment: SOLR-4744.patch

Changes:
# New syncAdd and syncDelete methods in SolrCmdDistributor which add/delete 
synchronously and propagate exceptions
# DistributedUpdateProcessor calls cmdDistrib.syncAdd inside versionAdd because 
that's the only place where we have the version and the full doc and an 
opportunity to do remote synchronous add before local add
# Similarly cmdDistrib.syncDelete is called by versionDelete and doDeleteByQuery
# ShardSplitTest tests for delete-by-id

With these changes, any exception while forwarding updates to sub shard 
leaders, will result in an exception being thrown to the client. The client can 
then retry the operation. 

A code review would be helpful.

Considering that without this fix, shard splitting can, in some cases, lead to 
data loss, we should add this to 4.3.1

> Version conflict error during shard split test
> --
>
> Key: SOLR-4744
> URL: https://issues.apache.org/jira/browse/SOLR-4744
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 4.4
>
> Attachments: SOLR-4744.patch
>
>
> ShardSplitTest fails sometimes with the following error:
> {code}
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state 
> invoked for collection: collection1
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state shard1 
> to inactive
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state 
> shard1_0 to active
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state 
> shard1_1 to active
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.873; 
> org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp= 
> path=/update params={wt=javabin&version=2} {add=[169 (1432319507166134272)]} 
> 0 2
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.884; 
> org.apache.solr.update.processor.LogUpdateProcessor; 
> [collection1_shard1_1_replica1] webapp= path=/update 
> params={distrib.from=http://127.0.0.1:41028/collection1/&update.distrib=FROMLEADER&wt=javabin&distrib.from.parent=shard1&version=2}
>  {} 0 1
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.885; 
> org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp= 
> path=/update 
> params={distrib.from=http://127.0.0.1:41028/collection1/&update.distrib=FROMLEADER&wt=javabin&distrib.from.parent=shard1&version=2}
>  {add=[169 (1432319507173474304)]} 0 2
> [junit4:junit4]   1> ERROR - 2013-04-14 19:05:26.885; 
> org.apache.solr.common.SolrException; shard update error StdNode: 
> http://127.0.0.1:41028/collection1_shard1_1_replica1/:org.apache.solr.common.SolrException:
>  version conflict for 169 expected=1432319507173474304 actual=-1
> [junit4:junit4]   1>  

[jira] [Commented] (LUCENE-5015) Unexpected performance difference between SamplingAccumulator and StandardFacetAccumulator

2013-05-26 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667223#comment-13667223
 ] 

Shai Erera commented on LUCENE-5015:


Thanks Gilad. Now that we have SampleFixer on SamplingParams, I wonder why we 
need Noop and Amortized? Could we just make the default fixer null and not 
oversample + fix if it's null? And Amortized ... well as you said, it looks 
kind of redundant now... I don't think it's rocket science for an app to do 
value/ratio on its own, yet it's one more class that we need to maintain going 
forward?

> Unexpected performance difference between SamplingAccumulator and 
> StandardFacetAccumulator
> --
>
> Key: LUCENE-5015
> URL: https://issues.apache.org/jira/browse/LUCENE-5015
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: 4.3
>Reporter: Rob Audenaerde
>Assignee: Shai Erera
>Priority: Minor
> Attachments: LUCENE-5015.patch, LUCENE-5015.patch, LUCENE-5015.patch, 
> LUCENE-5015.patch
>
>
> I have an unexpected performance difference between the SamplingAccumulator 
> and the StandardFacetAccumulator. 
> The case is an index with about 5M documents and each document containing 
> about 10 fields. I created a facet on each of those fields. When searching to 
> retrieve facet-counts (using 1 CountFacetRequest), the SamplingAccumulator is 
> about twice as fast as the StandardFacetAccumulator. This is expected and a 
> nice speed-up. 
> However, when I use more CountFacetRequests to retrieve facet-counts for more 
> than one field, the speeds of the SampingAccumulator decreases, to the point 
> where the StandardFacetAccumulator is faster. 
> {noformat} 
> FacetRequests  SamplingStandard
>  1   391 ms 1100 ms
>  2   531 ms 1095 ms 
>  3   948 ms 1108 ms
>  4  1400 ms 1110 ms
>  5  1901 ms 1102 ms
> {noformat} 
> Is this behaviour normal? I did not expect it, as the SamplingAccumulator 
> needs to do less work? 
> Some code to show what I do:
> {code}
>   searcher.search( facetsQuery, facetsCollector );
>   final List collectedFacets = 
> facetsCollector.getFacetResults();
> {code}
> {code}
> final FacetSearchParams facetSearchParams = new FacetSearchParams( 
> facetRequests );
> FacetsCollector facetsCollector;
> if ( isSampled )
> {
>   facetsCollector =
>   FacetsCollector.create( new SamplingAccumulator( new 
> RandomSampler(), facetSearchParams, searcher.getIndexReader(), taxo ) );
> }
> else
> {
>   facetsCollector = FacetsCollector.create( FacetsAccumulator.create( 
> facetSearchParams, searcher.getIndexReader(), taxo ) );
> {code}
>   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5015) Unexpected performance difference between SamplingAccumulator and StandardFacetAccumulator

2013-05-26 Thread Gilad Barkai (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilad Barkai updated LUCENE-5015:
-

Attachment: LUCENE-5015.patch

True, looking at overSampleFactor is enough, but it's not obvious that 
TakmiFixer should be used with overSampleFactor > 1, to better the chances of 
the result top-k being accurate.
I'll add some documentation w.r.t this issue, I hope it will do.

New patch defaults to {{NoopSampleFixer}} which does not touch the results at 
all - if the need is only for a top-k and their counts does not matter, this is 
the least expensive one. 
Also if instead of counts, a percentage sould be displayed (as how much of the 
results match this category), the sampled valued out of the sample size would 
yield the same result as the amortized fixed results out of the actual result 
set size. That might render the amortized fixer moot..

New patch account of {{SampleFixer}} being set in {{SamplingParams}}

> Unexpected performance difference between SamplingAccumulator and 
> StandardFacetAccumulator
> --
>
> Key: LUCENE-5015
> URL: https://issues.apache.org/jira/browse/LUCENE-5015
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: 4.3
>Reporter: Rob Audenaerde
>Assignee: Shai Erera
>Priority: Minor
> Attachments: LUCENE-5015.patch, LUCENE-5015.patch, LUCENE-5015.patch, 
> LUCENE-5015.patch
>
>
> I have an unexpected performance difference between the SamplingAccumulator 
> and the StandardFacetAccumulator. 
> The case is an index with about 5M documents and each document containing 
> about 10 fields. I created a facet on each of those fields. When searching to 
> retrieve facet-counts (using 1 CountFacetRequest), the SamplingAccumulator is 
> about twice as fast as the StandardFacetAccumulator. This is expected and a 
> nice speed-up. 
> However, when I use more CountFacetRequests to retrieve facet-counts for more 
> than one field, the speeds of the SampingAccumulator decreases, to the point 
> where the StandardFacetAccumulator is faster. 
> {noformat} 
> FacetRequests  SamplingStandard
>  1   391 ms 1100 ms
>  2   531 ms 1095 ms 
>  3   948 ms 1108 ms
>  4  1400 ms 1110 ms
>  5  1901 ms 1102 ms
> {noformat} 
> Is this behaviour normal? I did not expect it, as the SamplingAccumulator 
> needs to do less work? 
> Some code to show what I do:
> {code}
>   searcher.search( facetsQuery, facetsCollector );
>   final List collectedFacets = 
> facetsCollector.getFacetResults();
> {code}
> {code}
> final FacetSearchParams facetSearchParams = new FacetSearchParams( 
> facetRequests );
> FacetsCollector facetsCollector;
> if ( isSampled )
> {
>   facetsCollector =
>   FacetsCollector.create( new SamplingAccumulator( new 
> RandomSampler(), facetSearchParams, searcher.getIndexReader(), taxo ) );
> }
> else
> {
>   facetsCollector = FacetsCollector.create( FacetsAccumulator.create( 
> facetSearchParams, searcher.getIndexReader(), taxo ) );
> {code}
>   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org