Re: CLUSTERSTATE timeout

2015-04-14 Thread adfel70
I'm having the same issue with 4.10.3

I'm performing various task on clusterstate API and getting random timeouts
throguhout the day.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CLUSTERSTATE-timeout-tp4199367p4199501.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Lazy startup - load-on-startup missing from web.xml?

2015-04-14 Thread Gili Nachum
Hi, it worked!
The issue was originally on WAS 7, but has somehow regressed to WebSphere
8.5.
Thanks.

On Thu, Feb 19, 2015 at 10:13 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:

 : Hi! Solr is starting up dormant for me, until a client wake it up with
 a
 : REST request, or I open admin UI, only then the remaining initializing
 : happens.
 : Is it something known?

 based on my recollection of the servlet spec, that sounds like a
 bug/glitch/config option in your Servlet container...

 Googling WebSphere init Filters on startup turns up this IBM bug report
 with noted fix versions...
 http://www-01.ibm.com/support/docview.wss?uid=swg1PK86553


 : I can't see any load-on-startup in the web.xml, in Solr.war.

 The bulk of Solr exists as a Filter.  Filters are not permitted
 by the servlet spec to specify load-on-startup value (only
 Servlets can specify that, and the only Servlets in Solr are for
 supporting legacy paths -- the load order doesn't matter for them)


 : Running Solr 4.7.2 over WebSphere 8.5
 :
 : App loading message as the server starts up:
 : [2/*16*/15 12:17:19:956 GMT] 0056 ApplicationMg A   WSVR0221I:
 : Application started: solr-4.7.2
 : [2/*16*/15 12:17:20:319 GMT] 0001 WsServerImpl  A   WSVR0001I:
 : Server serverSolr open for e-business
 : The the next start up message in the log is on the next day once I enter
 : Solr admin UI:
 : [2/*17*/15 10:20:13:827 GMT] 0098 SolrDispatchF I
 : org.apache.solr.servlet.SolrDispatchFilter init SolrDispatchFilter.init()
 : ...
 :

 -Hoss
 http://www.lucidworks.com/



Re: Java.net.socketexception: broken pipe Solr 4.10.2

2015-04-14 Thread Shawn Heisey
On 4/13/2015 10:11 PM, vsilgalis wrote:
 just a couple of notes:
 this a 2 shard setup with 2 nodes per shard.
 
 Currently these are on VMs with 8 cores and 8GB of ram each (java max heap
 is ~5588mb but we usually never even get that high) backed by a NFS file
 store which we store the indexes on (netapp SAN with nfs exports on SAS
 disk).

Broken pipe errors usually indicate that the client gave up waiting for
the server and disconnected the TCP connection before the server
completed processing and sent a response.  This is frequently because of
configured timeouts on the client.  If reasonable timeouts are being
exceeded, it's usually a performance problem.

You haven't indicated how much disk space is occupied by the index data
on each of these servers.  There are also several other things that
would be helpful to know.

Please read this wiki page, then come back with any questions you might
have, and I may also ask a question or two:

http://wiki.apache.org/solr/SolrPerformanceProblems

My immediate suspects are an OS disk cache that is too small, and/or
problems with garbage collection pauses.  These are two of the issues
discussed on that wiki page.

Thanks,
Shawn



Problem related to filter on Zero value for DateField

2015-04-14 Thread Ali Nazemian
Dears,
Hi,
I have strange problem with Solr 4.10.x. My problem is when I do searching
on solr Zero date which is 0002-11-30T00:00:00Z if more than one filter
be considered, the results became invalid. For example consider this
scenario:
When I search for a document with fq=p_date:0002-11-30T00:00:00Z Solr
returns three different documents which is right for my Collection. All of
these three documents have same value of 7 for document status. Now If I
search for fq=document_status:7 the same three documents returns which is
also a correct response. But When I do the searching on
fq=focument_status:7fq=p_date:0002-11-30T00:00:00Z, Solr returns
nothing! (0 document) While I have not such problem with other date values
beside Solr Zero (0002-11-30T00:00:00Z). Please let me know it is a bug
related to Solr or I did something wrong?
Best regards.

-- 
A.Nazemian


Re: Securing solr index

2015-04-14 Thread Per Steffensen

Hi

I might misunderstand you, but if you are talking about securing the 
actual files/folders of the index, I do not think this is a Solr/Lucene 
concern. Use standard mechanisms of your OS. E.g. on linux/unix use 
chown, chgrp, chmod, sudo, apparmor etc - e.g. allowing only root to 
write the folders/files and sudo the user running Solr/Lucene to operate 
as root in this area. Even admins should not (normally) operate as root 
- that way they cannot write the files either. No one knows the 
root-password - except maybe for the super-super-admin, or you split the 
root-password in two and two admins know a part each, so that they have 
to both agree in order to operate as root. Be creative yourself.


Regards, Per Steffensen

On 13/04/15 12:13, Suresh Vanasekaran wrote:

Hi,

We are having the solr index maintained in a central server and multiple users 
might be able to access the index data.

May I know what are best practice for securing the solr index folder where 
ideally only application user should be able to access. Even an admin user 
should not be able to copy the data and use it in another schema.

Thanks



 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***





facet on external field

2015-04-14 Thread jainam vora
Hi,

I am using external field for price field since it changes frequently.
generate facets using external field? how?

I understand that faceting requires indexing and external fields fields are
not actually indexed.


-- 
Thanks  Regards,
Jainam Vora


Errors during Indexing in SOLR 4.6

2015-04-14 Thread abhi Abhishek
Hi All,
 we recently migrated from SOLR 3.6 to SOLR 4, while indexing in SOLR 4
we are getting below exception.

Apr 1, 2015 9:22:57 AM org.apache.solr.common.SolrException log

SEVERE: null:org.apache.solr.common.SolrException: Exception writing
document id 932684555 to the index; possible analysis error.

at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)

at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)

at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)

Caused by: java.lang.IllegalArgumentException: first position increment
must be  0 (got 0) for field 'DataEnglish'

at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:131)



this works perfectly fine in SOLR 3.6. can someone help in debugging this.
any fixes/solutions?


Thanks in Advance.


Best Regards,

Abhishek


Re: Java.net.socketexception: broken pipe Solr 4.10.2

2015-04-14 Thread vsilgalis
Right now index size is about 10GB on each shard (yes I could use more RAM),
but I'm looking more for a step up then step down approach.  I will try
adding more RAM to these machines as my next step.

1. Zookeeper is external to these boxes in a three node cluster with more
than enough RAM to keep everything off disk.

2. os disk cache, when I add more RAM I will just add it as RAM for the
machine and not to the Java Heap unless that is something you recommend.

3. java heap looks good so far, GC is minimal as far as i can tell but I can
look into this some more.

4. we do have 2 cores per machine, but the second core is a joke (10MB)

note: zkClientTimeout is set to 30 for safety's sake.

java settings:
-XX:+CMSClassUnloadingEnabled-XX:+AggressiveOpts-XX:+ParallelRefProcEnabled-XX:+CMSParallelRemarkEnabled-XX:CMSMaxAbortablePrecleanTime=6000-XX:CMSTriggerPermRatio=80-XX:CMSInitiatingOccupancyFraction=50-XX:+UseCMSInitiatingOccupancyOnly-XX:CMSFullGCsBeforeCompaction=1-XX:PretenureSizeThreshold=64m-XX:+CMSScavengeBeforeRemark-XX:ParallelGCThreads=4-XX:ConcGCThreads=4-XX:+UseConcMarkSweepGC-XX:+UseParNewGC-XX:MaxTenuringThreshold=8-XX:TargetSurvivorRatio=90-XX:SurvivorRatio=4-XX:NewRatio=3-XX:-UseSuperWord-Xmx5588m-Xms1596m



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Java-net-socketexception-broken-pipe-Solr-4-10-2-tp4199484p4199561.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing PDF and MS Office files

2015-04-14 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi,

Here are the solr-config xml and the error log from Solr logs for your
reference. As mentioned earlier, I didnt make any changes to the
solr-config.xml as I am using the xml file out of the box one that came
with the default installation.

Please let me know your thoughts on why these issues are occuring.

Thanks  Regards
Vijay


*Vijay Bhoomireddy*, Big Data Architect

1000 Great West Road, Brentford, London, TW8 9DW
*T:  +44 20 3475 7980*
*M: **+44 7481 298 360*
*W: *ww http://www.whishworks.com/w.whishworks.com
http://www.whishworks.com/

https://www.linkedin.com/company/whishworks
http://www.whishworks.com/blog/  https://twitter.com/WHISHWORKS
https://www.facebook.com/whishworksit

On 14 April 2015 at 15:57, Vijaya Narayana Reddy Bhoomi Reddy 
vijaya.bhoomire...@whishworks.com wrote:

 Hi,

 I am trying to index PDF and Microsoft Office files (.doc, .docx, .ppt,
 .pptx, .xlx, and .xlx) files into Solr. I am facing the following issues.
 Request to please let me know what is going wrong with the indexing
 process.

 I am using solr 4.10.2 and using the default example server configuration
 that comes with Solr distribution.

 PDF Files - Indexing as such works fine, but when I query using *.* in the
 Solr Query console, metadata information is displayed properly. However,
 the PDF content field is empty. This is happening for all PDF files I have
 tried. I have tried with some proprietary files, PDF eBooks etc. Whatever
 be the PDF file, content is not being displayed.

 MS Office files -  For some office files, everything works perfect and the
 extracted content is visible in the query console. However, for others, I
 see the below error message during the indexing process.

 *Exception in thread main
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 org.apache.tika.exception.TikaException: Unexpected RuntimeException from
 org.apache.tika.parser.microsoft.OfficeParser*


 I am using SolrJ to index the documents and below is the code snippet
 related to indexing. Please let me know where the issue is occurring.

 static String solrServerURL = 
 http://localhost:8983/solr;;
 static SolrServer solrServer = new HttpSolrServer(solrServerURL);
 static ContentStreamUpdateRequest indexingReq =
 new
 ContentStreamUpdateRequest(/update/extract);

 indexingReq.addFile(file, fileType);
 indexingReq.setParam(literal.id, literalId);
 indexingReq.setParam(uprefix, attr_);
 indexingReq.setParam(fmap.content, content);
 indexingReq.setParam(literal.fileurl, fileURL);
 indexingReq.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
 solrServer.request(indexingReq);

 Thanks  Regards
 Vijay




-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.
?xml version=1.0 encoding=UTF-8 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

!-- 
 For more details about configurations options that may appear in
 this file, see http://wiki.apache.org/solr/SolrConfigXml. 
--
config
  !-- In all configuration below, a prefix of solr. for class names
   is an alias that causes solr to search appropriate packages,
   including org.apache.solr.(search|update|request|core|analysis)

   You may also specify a fully qualified Java classname if you
   have your own custom plugins.
--

  !-- Controls what version of Lucene various components of Solr
   adhere to.  Generally, you want to use the latest version to
   get all bug fixes and improvements. It is highly recommended
   that you fully re-index after changing this setting as it can
   affect both how text is indexed and queried.
  --
  luceneMatchVersion4.10.2/luceneMatchVersion

  !-- lib/ directives can be used to instruct Solr to load any Jars
   identified and use them to resolve any plugins specified in
   your solrconfig.xml or schema.xml (ie: Analyzers, Request
 

Re: Indexing PDF and MS Office files

2015-04-14 Thread Vijaya Narayana Reddy Bhoomi Reddy
Andrea,

Yes, I am using the stock schema.xml that comes with the example server of
Solr-4.10.2 Hence not sure why the PDF content is not getting extracted and
put into the content field in the index.

Please find the log information for the Parsing error below.


org.apache.solr.common.SolrException; org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@138b0c5
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@138b0c5
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
... 32 more
Caused by: java.lang.IllegalArgumentException: This paragraph is not the
first one in the table
at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:932)
at
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:188)
at
org.apache.tika.parser.microsoft.WordExtractor.handleHeaderFooter(WordExtractor.java:172)
at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:98)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:199)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:167)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
... 35 more

ERROR - 2015-04-14 14:51:21.151; org.apache.solr.common.SolrException;
null:org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@138b0c5
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at

Re: Indexing PDF and MS Office files

2015-04-14 Thread Andrea Gazzarini

It seems something like https://issues.apache.org/jira/browse/TIKA-1251.
I see you're using Solr 4.10.2 which uses Tika 1.5 and that issue seems 
to be fixed in Tika 1.6.


I agree with Erik: you should try with another version of Tika.

Best,
Andrea

On 04/14/2015 06:44 PM, Vijaya Narayana Reddy Bhoomi Reddy wrote:

Andrea,

Yes, I am using the stock schema.xml that comes with the example server of
Solr-4.10.2 Hence not sure why the PDF content is not getting extracted and
put into the content field in the index.

Please find the log information for the Parsing error below.


org.apache.solr.common.SolrException; org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@138b0c5
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@138b0c5
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
... 32 more
Caused by: java.lang.IllegalArgumentException: This paragraph is not the
first one in the table
at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:932)
at
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:188)
at
org.apache.tika.parser.microsoft.WordExtractor.handleHeaderFooter(WordExtractor.java:172)
at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:98)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:199)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:167)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
... 35 more

ERROR - 2015-04-14 14:51:21.151; org.apache.solr.common.SolrException;
null:org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@138b0c5
at

sort by a copy field error

2015-04-14 Thread Pedro Figueiredo
Hello,

 

I have a pretty basic question:  how can I sort by a copyfield?

 

My schema conf is:

 

field name=name type=text_general_edge_ngram indexed=true
stored=true omitNorms=true termVectors=true/

field name=name_sort type=string indexed=true stored=false/

copyField source=name dest=name_sort /  

 

And when I try to sort by name_sort the following error is raised: 

error: {

msg: sort param field can't be found: name_sort,

code: 400

  }

 

Thanks in advanced,

 

Pedro Figueiredo

 



[ANNOUNCE] Apache Solr 5.1.0 released

2015-04-14 Thread Timothy Potter
14 April 2015 - The Lucene PMC is pleased to announce the release of
Apache Solr 5.1.0.

Solr 5.1.0 is available for immediate download at:
http://www.apache.org/dyn/closer.cgi/lucene/solr/5.1.0

Solr 5.1.0 includes 39 new features, 40 bug fixes, and 36 optimizations
/ other changes from over 60 unique contributors.

For detailed information about what is included in 5.1.0 release,
please see: http://lucene.apache.org/solr/5_1_0/changes/Changes.html

Enjoy!


Re: Indexing PDF and MS Office files

2015-04-14 Thread Erick Erickson
looks like this is just a file that Tika can't handle, based on this line:

bq: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser

You might be able to get some joy from parsing this from Java and see
if a more recent Tika would fix it. Here's some  sample code:

http://lucidworks.com/blog/indexing-with-solrj/

Best,
Erick

On Tue, Apr 14, 2015 at 9:44 AM, Vijaya Narayana Reddy Bhoomi Reddy
vijaya.bhoomire...@whishworks.com wrote:
 Andrea,

 Yes, I am using the stock schema.xml that comes with the example server of
 Solr-4.10.2 Hence not sure why the PDF content is not getting extracted and
 put into the content field in the index.

 Please find the log information for the Parsing error below.


 org.apache.solr.common.SolrException; org.apache.solr.common.SolrException:
 org.apache.tika.exception.TikaException: Unexpected RuntimeException from
 org.apache.tika.parser.microsoft.OfficeParser@138b0c5
 at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Unknown Source)
 Caused by: org.apache.tika.exception.TikaException: Unexpected
 RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@138b0c5
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
 ... 32 more
 Caused by: java.lang.IllegalArgumentException: This paragraph is not the
 first one in the table
 at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:932)
 at
 org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:188)
 at
 org.apache.tika.parser.microsoft.WordExtractor.handleHeaderFooter(WordExtractor.java:172)
 at
 org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:98)
 at
 org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:199)
 at
 org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:167)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 

Indexing PDF and MS Office files

2015-04-14 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi,

I am trying to index PDF and Microsoft Office files (.doc, .docx, .ppt,
.pptx, .xlx, and .xlx) files into Solr. I am facing the following issues.
Request to please let me know what is going wrong with the indexing
process.

I am using solr 4.10.2 and using the default example server configuration
that comes with Solr distribution.

PDF Files - Indexing as such works fine, but when I query using *.* in the
Solr Query console, metadata information is displayed properly. However,
the PDF content field is empty. This is happening for all PDF files I have
tried. I have tried with some proprietary files, PDF eBooks etc. Whatever
be the PDF file, content is not being displayed.

MS Office files -  For some office files, everything works perfect and the
extracted content is visible in the query console. However, for others, I
see the below error message during the indexing process.

*Exception in thread main
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser*


I am using SolrJ to index the documents and below is the code snippet
related to indexing. Please let me know where the issue is occurring.

static String solrServerURL = 
http://localhost:8983/solr;;
static SolrServer solrServer = new HttpSolrServer(solrServerURL);
static ContentStreamUpdateRequest indexingReq = new

ContentStreamUpdateRequest(/update/extract);

indexingReq.addFile(file, fileType);
indexingReq.setParam(literal.id, literalId);
indexingReq.setParam(uprefix, attr_);
indexingReq.setParam(fmap.content, content);
indexingReq.setParam(literal.fileurl, fileURL);
indexingReq.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
solrServer.request(indexingReq);

Thanks  Regards
Vijay

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Re: Indexing PDF and MS Office files

2015-04-14 Thread Andrea Gazzarini

Hi Vijay,
Please paste an extract of your schema, where the content field (the 
field where the PDF text shoudl be) and its type are declared.

For the other issue, please paste the whole stacktrace because

org.apache.tika.parser.microsoft.OfficeParser*

says nothing. The complete stacktrace (or at least another three / four 
lines) should contain some other detail.


Best,
Andrea

On 04/14/2015 04:57 PM, Vijaya Narayana Reddy Bhoomi Reddy wrote:

Hi,

I am trying to index PDF and Microsoft Office files (.doc, .docx, .ppt,
.pptx, .xlx, and .xlx) files into Solr. I am facing the following issues.
Request to please let me know what is going wrong with the indexing
process.

I am using solr 4.10.2 and using the default example server configuration
that comes with Solr distribution.

PDF Files - Indexing as such works fine, but when I query using *.* in the
Solr Query console, metadata information is displayed properly. However,
the PDF content field is empty. This is happening for all PDF files I have
tried. I have tried with some proprietary files, PDF eBooks etc. Whatever
be the PDF file, content is not being displayed.

MS Office files -  For some office files, everything works perfect and the
extracted content is visible in the query console. However, for others, I
see the below error message during the indexing process.

*Exception in thread main
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser*


I am using SolrJ to index the documents and below is the code snippet
related to indexing. Please let me know where the issue is occurring.

 static String solrServerURL = 
http://localhost:8983/solr;;
static SolrServer solrServer = new HttpSolrServer(solrServerURL);
 static ContentStreamUpdateRequest indexingReq = new

 ContentStreamUpdateRequest(/update/extract);

 indexingReq.addFile(file, fileType);
indexingReq.setParam(literal.id, literalId);
indexingReq.setParam(uprefix, attr_);
indexingReq.setParam(fmap.content, content);
indexingReq.setParam(literal.fileurl, fileURL);
indexingReq.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
solrServer.request(indexingReq);

Thanks  Regards
Vijay





Re: Problem related to filter on Zero value for DateField

2015-04-14 Thread Jack Krupansky
What does your main query look like? Normally we don't speak of searching
with the fq parameter - it filters the results, but the actual searching is
done via the main query with the q parameter.

-- Jack Krupansky

On Tue, Apr 14, 2015 at 4:17 AM, Ali Nazemian alinazem...@gmail.com wrote:

 Dears,
 Hi,
 I have strange problem with Solr 4.10.x. My problem is when I do searching
 on solr Zero date which is 0002-11-30T00:00:00Z if more than one filter
 be considered, the results became invalid. For example consider this
 scenario:
 When I search for a document with fq=p_date:0002-11-30T00:00:00Z Solr
 returns three different documents which is right for my Collection. All of
 these three documents have same value of 7 for document status. Now If I
 search for fq=document_status:7 the same three documents returns which is
 also a correct response. But When I do the searching on
 fq=focument_status:7fq=p_date:0002-11-30T00:00:00Z, Solr returns
 nothing! (0 document) While I have not such problem with other date values
 beside Solr Zero (0002-11-30T00:00:00Z). Please let me know it is a bug
 related to Solr or I did something wrong?
 Best regards.

 --
 A.Nazemian



RE: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

2015-04-14 Thread Dyer, James
Elisabeth,

Currently ConjunctionSolrSpellChecker only supports adding 
WordBreakSolrSpellchecker to IndexBased- FileBased- or DirectSolrSpellChecker.  
In the future, it would be great if it could handle other Spell Checker 
combinations.  For instance, if you had a (e)dismax query that searches 
multiple fields, to have a separate spellchecker for each of them.

But CSSC is not hardened for this more general usage, as hinted in the API doc. 
 The check done to ensure all spellcheckers use the same stringdistance object, 
I believe, is a safeguard against using this class for functionality it is not 
able to correctly support.  It looks to me that SOLR-6271 was opened to fix the 
bug in that it is comparing references on the stringdistance.  This is not a 
problem with WBSSC because this one does not support string distance at all.

What you're hoping for, however, is that the requirement for the string 
distances be the same to be removed entirely.  You could try modifying the code 
by removing the check.  However beware that you might not get the results you 
desire!  But should this happen, please, go ahead and fix it for your use case 
and then donate the code.  This is something I've personally wanted for a long 
time.

James Dyer
Ingram Content Group


-Original Message-
From: elisabeth benoit [mailto:elisaelisael...@gmail.com] 
Sent: Tuesday, April 14, 2015 7:37 AM
To: solr-user@lucene.apache.org
Subject: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

Hello,

I am using Solr 4.10.1 and trying to use DirectSolrSpellChecker and
FileBasedSpellchecker in same request.

I've applied change from patch 135.patch (cf Solr-6271). I've tried running
the command patch -p1 -i 135.patch --dry-run but it didn't work, maybe
because the patch was a fix to Solr 4.9, so I just replaced line in
ConjunctionSolrSpellChecker

else if (!stringDistance.equals(checker.getStringDistance())) {
 throw new IllegalArgumentException(
 All checkers need to use the same StringDistance.);
   }


by

else if (!stringDistance.equals(checker.getStringDistance())) {
throw new IllegalArgumentException(
All checkers need to use the same StringDistance!!! 1: +
checker.getStringDistance() +  2:  + stringDistance);
  }

as it was done in the patch

but still, when I send a spellcheck request, I get the error

msg: All checkers need to use the same StringDistance!!!
1:org.apache.lucene.search.spell.LuceneLevenshteinDistance@15f57db32:
org.apache.lucene.search.spell.LuceneLevenshteinDistance@280f7e08

From error message I gather both spellchecker use same distanceMeasure
LuceneLevenshteinDistance, but they're not same instance of
LuceneLevenshteinDistance.

Is the condition all right? What should be done to fix this properly?

Thanks,
Elisabeth


Re: Indexing PDF and MS Office files

2015-04-14 Thread Andrea Gazzarini

Hi,
solrconfig.xml (especially if you didn't touch it) should be good. What 
about the schema? Are you using the one that comes with the download 
bundle, too?


I don't see the stacktrace..did you forget to paste it?

Best,
Andrea

On 04/14/2015 06:06 PM, Vijaya Narayana Reddy Bhoomi Reddy wrote:

Hi,

Here are the solr-config xml and the error log from Solr logs for your 
reference. As mentioned earlier, I didnt make any changes to the 
solr-config.xml as I am using the xml file out of the box one that 
came with the default installation.


Please let me know your thoughts on why these issues are occuring.

Thanks  Regards
Vijay




*Vijay Bhoomireddy*, Big Data Architect

1000 Great West Road, Brentford, London, TW8 9DW

*T:+44 20 3475 7980*
*M:**+44 7481 298 360*
*W: *ww http://www.whishworks.com/w.whishworks.com 
http://www.whishworks.com/


https://www.linkedin.com/company/whishworkshttp://www.whishworks.com/blog/https://twitter.com/WHISHWORKShttps://www.facebook.com/whishworksit


On 14 April 2015 at 15:57, Vijaya Narayana Reddy Bhoomi Reddy 
vijaya.bhoomire...@whishworks.com 
mailto:vijaya.bhoomire...@whishworks.com wrote:


Hi,

I am trying to index PDF and Microsoft Office files (.doc, .docx,
.ppt, .pptx, .xlx, and .xlx) files into Solr. I am facing the
following issues. Request to please let me know what is going
wrong with the indexing process.

I am using solr 4.10.2 and using the default example server
configuration that comes with Solr distribution.

PDF Files - Indexing as such works fine, but when I query using
*.* in the Solr Query console, metadata information is displayed
properly. However, the PDF content field is empty. This is
happening for all PDF files I have tried. I have tried with some
proprietary files, PDF eBooks etc. Whatever be the PDF file,
content is not being displayed.

MS Office files -  For some office files, everything works perfect
and the extracted content is visible in the query console.
However, for others, I see the below error message during the
indexing process.

*Exception in thread main
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser*
*
*

I am using SolrJ to index the documents and below is the code
snippet related to indexing. Please let me know where the issue is
occurring.

static String solrServerURL =
http://localhost:8983/solr;;
static SolrServer solrServer = new HttpSolrServer(solrServerURL);
static ContentStreamUpdateRequest
indexingReq = new ContentStreamUpdateRequest(/update/extract);

indexingReq.addFile(file, fileType);
indexingReq.setParam(literal.id http://literal.id, literalId);
indexingReq.setParam(uprefix, attr_);
indexingReq.setParam(fmap.content, content);
indexingReq.setParam(literal.fileurl, fileURL);
indexingReq.setAction(AbstractUpdateRequest.ACTION.COMMIT, true,
true);
solrServer.request(indexingReq);

Thanks  Regards
Vijay




The contents of this e-mail are confidential and for the exclusive use 
of the intended recipient. If you receive this e-mail in error please 
delete it from your system immediately and notify us either by e-mail 
or telephone. You should not copy, forward or otherwise disclose the 
content of the e-mail. The views expressed in this communication may 
not necessarily be the view held by WHISHWORKS. 




Re: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

2015-04-14 Thread elisabeth benoit
Thanks for your answer!

I didn't realize this what not supposed to be done (conjunction of
DirectSolrSpellChecker and FileBasedSpellChecker). I got this idea in the
mailing list while searching for a solution to get a list of words to
ignore for the DirectSolrSpellChecker.

Well well well, I'll try removing the check and see what happens. I'm not a
java programmer, but if I can find a simple solution I'll let you know.

Thanks again,
Elisabeth

2015-04-14 16:29 GMT+02:00 Dyer, James james.d...@ingramcontent.com:

 Elisabeth,

 Currently ConjunctionSolrSpellChecker only supports adding
 WordBreakSolrSpellchecker to IndexBased- FileBased- or
 DirectSolrSpellChecker.  In the future, it would be great if it could
 handle other Spell Checker combinations.  For instance, if you had a
 (e)dismax query that searches multiple fields, to have a separate
 spellchecker for each of them.

 But CSSC is not hardened for this more general usage, as hinted in the API
 doc.  The check done to ensure all spellcheckers use the same
 stringdistance object, I believe, is a safeguard against using this class
 for functionality it is not able to correctly support.  It looks to me that
 SOLR-6271 was opened to fix the bug in that it is comparing references on
 the stringdistance.  This is not a problem with WBSSC because this one does
 not support string distance at all.

 What you're hoping for, however, is that the requirement for the string
 distances be the same to be removed entirely.  You could try modifying the
 code by removing the check.  However beware that you might not get the
 results you desire!  But should this happen, please, go ahead and fix it
 for your use case and then donate the code.  This is something I've
 personally wanted for a long time.

 James Dyer
 Ingram Content Group


 -Original Message-
 From: elisabeth benoit [mailto:elisaelisael...@gmail.com]
 Sent: Tuesday, April 14, 2015 7:37 AM
 To: solr-user@lucene.apache.org
 Subject: using DirectSpellChecker and FileBasedSpellChecker with Solr
 4.10.1

 Hello,

 I am using Solr 4.10.1 and trying to use DirectSolrSpellChecker and
 FileBasedSpellchecker in same request.

 I've applied change from patch 135.patch (cf Solr-6271). I've tried running
 the command patch -p1 -i 135.patch --dry-run but it didn't work, maybe
 because the patch was a fix to Solr 4.9, so I just replaced line in
 ConjunctionSolrSpellChecker

 else if (!stringDistance.equals(checker.getStringDistance())) {
  throw new IllegalArgumentException(
  All checkers need to use the same StringDistance.);
}


 by

 else if (!stringDistance.equals(checker.getStringDistance())) {
 throw new IllegalArgumentException(
 All checkers need to use the same StringDistance!!! 1: +
 checker.getStringDistance() +  2:  + stringDistance);
   }

 as it was done in the patch

 but still, when I send a spellcheck request, I get the error

 msg: All checkers need to use the same StringDistance!!!
 1:org.apache.lucene.search.spell.LuceneLevenshteinDistance@15f57db32:
 org.apache.lucene.search.spell.LuceneLevenshteinDistance@280f7e08

 From error message I gather both spellchecker use same distanceMeasure
 LuceneLevenshteinDistance, but they're not same instance of
 LuceneLevenshteinDistance.

 Is the condition all right? What should be done to fix this properly?

 Thanks,
 Elisabeth



proper routing (from non-Java client) in solr cloud 5.0.0

2015-04-14 Thread Ian Rose
Hi all -

I've just upgraded my dev install of Solr (cloud) from 4.10 to 5.0.  Our
client is written in Go, for which I am not aware of a client, so we wrote
our own.  One tricky bit for this was the routing logic; if a document has
routing prefix X and belong to collection Y, we need to know which solr
node to connect to.  Previously we accomplished this by watching the
clusterstate.json
file in zookeeper - at startup and whenever it changes, the client parses
the file contents to build a routing table.

However in 5.0 newly create collections do not show up in clusterstate.json
but instead have their own state.json document.  Are there any
recommendations for how to handle this from the client?  The obvious answer
is to watch every collection's state.json document, but we run a lot of
collections (~1000 currently, and growing) so I'm concerned about keeping
that many watches open at the same time (should I be?).  How does the SolrJ
client handle this?

Thanks!
- Ian


RE: Securing solr index

2015-04-14 Thread Davis, Daniel (NIH/NLM) [C]
That's a good point - if he's talking about securing the Solr filesystem, he 
can use standard mechanisms.

You can also go beyond user/group/other permissions if your filesystem supports 
it.   You can use Posix ACLs on many local linux filesystems.

-Original Message-
From: Per Steffensen [mailto:st...@designware.dk] 
Sent: Tuesday, April 14, 2015 8:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Securing solr index

Hi

I might misunderstand you, but if you are talking about securing the actual 
files/folders of the index, I do not think this is a Solr/Lucene concern. Use 
standard mechanisms of your OS. E.g. on linux/unix use chown, chgrp, chmod, 
sudo, apparmor etc - e.g. allowing only root to write the folders/files and 
sudo the user running Solr/Lucene to operate as root in this area. Even admins 
should not (normally) operate as root
- that way they cannot write the files either. No one knows the root-password - 
except maybe for the super-super-admin, or you split the root-password in two 
and two admins know a part each, so that they have to both agree in order to 
operate as root. Be creative yourself.

Regards, Per Steffensen

On 13/04/15 12:13, Suresh Vanasekaran wrote:
 Hi,

 We are having the solr index maintained in a central server and multiple 
 users might be able to access the index data.

 May I know what are best practice for securing the solr index folder where 
 ideally only application user should be able to access. Even an admin user 
 should not be able to copy the data and use it in another schema.

 Thanks



  CAUTION - Disclaimer * This e-mail 
 contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for 
 the use of the addressee(s). If you are not the intended recipient, 
 please notify the sender by e-mail and delete the original message. 
 Further, you are not to copy, disclose, or distribute this e-mail or 
 its contents to any other person and any such actions are unlawful. 
 This e-mail may contain viruses. Infosys has taken every reasonable 
 precaution to minimize this risk, but is not liable for any damage you 
 may sustain as a result of any virus in this e-mail. You should carry 
 out your own virus checks before opening the e-mail or attachment. 
 Infosys reserves the right to monitor and review the content of all 
 messages sent to or from this e-mail address. Messages sent to or from this 
 e-mail address may be stored on the Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




Re: Indexing PDF and MS Office files

2015-04-14 Thread Jack Krupansky
Try doing a manual extraction request directly to Solr (not via SolrJ) and
use the extractOnly option to see if the content is actually extracted.

See:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

Also, some PDF files actually have the content as a bitmap image, so no
text is extracted.


-- Jack Krupansky

On Tue, Apr 14, 2015 at 10:57 AM, Vijaya Narayana Reddy Bhoomi Reddy 
vijaya.bhoomire...@whishworks.com wrote:

 Hi,

 I am trying to index PDF and Microsoft Office files (.doc, .docx, .ppt,
 .pptx, .xlx, and .xlx) files into Solr. I am facing the following issues.
 Request to please let me know what is going wrong with the indexing
 process.

 I am using solr 4.10.2 and using the default example server configuration
 that comes with Solr distribution.

 PDF Files - Indexing as such works fine, but when I query using *.* in the
 Solr Query console, metadata information is displayed properly. However,
 the PDF content field is empty. This is happening for all PDF files I have
 tried. I have tried with some proprietary files, PDF eBooks etc. Whatever
 be the PDF file, content is not being displayed.

 MS Office files -  For some office files, everything works perfect and the
 extracted content is visible in the query console. However, for others, I
 see the below error message during the indexing process.

 *Exception in thread main
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 org.apache.tika.exception.TikaException: Unexpected RuntimeException from
 org.apache.tika.parser.microsoft.OfficeParser*


 I am using SolrJ to index the documents and below is the code snippet
 related to indexing. Please let me know where the issue is occurring.

 static String solrServerURL = 
 http://localhost:8983/solr;;
 static SolrServer solrServer = new HttpSolrServer(solrServerURL);
 static ContentStreamUpdateRequest indexingReq = new

 ContentStreamUpdateRequest(/update/extract);

 indexingReq.addFile(file, fileType);
 indexingReq.setParam(literal.id, literalId);
 indexingReq.setParam(uprefix, attr_);
 indexingReq.setParam(fmap.content, content);
 indexingReq.setParam(literal.fileurl, fileURL);
 indexingReq.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
 solrServer.request(indexingReq);

 Thanks  Regards
 Vijay

 --
 The contents of this e-mail are confidential and for the exclusive use of
 the intended recipient. If you receive this e-mail in error please delete
 it from your system immediately and notify us either by e-mail or
 telephone. You should not copy, forward or otherwise disclose the content
 of the e-mail. The views expressed in this communication may not
 necessarily be the view held by WHISHWORKS.



Re: Indexing PDF and MS Office files

2015-04-14 Thread Shyam R
Vijay,

You could try different excel files with different formats to rule out the
issue is with TIKA version being used.

Thanks
Murthy

On Wed, Apr 15, 2015 at 9:35 AM, Terry Rhodes trhodes...@gmail.com wrote:

 Perhaps the PDF is protected and the content can not be extracted?

 i have an unverified suspicion that the tika shipped with solr 4.10.2 may
 not support some/all office 2013 document formats.





 On 4/14/2015 8:18 PM, Jack Krupansky wrote:

 Try doing a manual extraction request directly to Solr (not via SolrJ) and
 use the extractOnly option to see if the content is actually extracted.

 See:
 https://cwiki.apache.org/confluence/display/solr/
 Uploading+Data+with+Solr+Cell+using+Apache+Tika

 Also, some PDF files actually have the content as a bitmap image, so no
 text is extracted.


 -- Jack Krupansky

 On Tue, Apr 14, 2015 at 10:57 AM, Vijaya Narayana Reddy Bhoomi Reddy 
 vijaya.bhoomire...@whishworks.com wrote:

  Hi,

 I am trying to index PDF and Microsoft Office files (.doc, .docx, .ppt,
 .pptx, .xlx, and .xlx) files into Solr. I am facing the following issues.
 Request to please let me know what is going wrong with the indexing
 process.

 I am using solr 4.10.2 and using the default example server configuration
 that comes with Solr distribution.

 PDF Files - Indexing as such works fine, but when I query using *.* in
 the
 Solr Query console, metadata information is displayed properly. However,
 the PDF content field is empty. This is happening for all PDF files I
 have
 tried. I have tried with some proprietary files, PDF eBooks etc. Whatever
 be the PDF file, content is not being displayed.

 MS Office files -  For some office files, everything works perfect and
 the
 extracted content is visible in the query console. However, for others, I
 see the below error message during the indexing process.

 *Exception in thread main
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 org.apache.tika.exception.TikaException: Unexpected RuntimeException
 from
 org.apache.tika.parser.microsoft.OfficeParser*


 I am using SolrJ to index the documents and below is the code snippet
 related to indexing. Please let me know where the issue is occurring.

  static String solrServerURL = 
 http://localhost:8983/solr;;
 static SolrServer solrServer = new HttpSolrServer(solrServerURL);
  static ContentStreamUpdateRequest indexingReq =
 new

  ContentStreamUpdateRequest(/update/extract);

  indexingReq.addFile(file, fileType);
 indexingReq.setParam(literal.id, literalId);
 indexingReq.setParam(uprefix, attr_);
 indexingReq.setParam(fmap.content, content);
 indexingReq.setParam(literal.fileurl, fileURL);
 indexingReq.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
 solrServer.request(indexingReq);

 Thanks  Regards
 Vijay

 --
 The contents of this e-mail are confidential and for the exclusive use of
 the intended recipient. If you receive this e-mail in error please delete
 it from your system immediately and notify us either by e-mail or
 telephone. You should not copy, forward or otherwise disclose the content
 of the e-mail. The views expressed in this communication may not
 necessarily be the view held by WHISHWORKS.





-- 
Ph: 9845704792


Re: Indexing PDF and MS Office files

2015-04-14 Thread Terry Rhodes

Perhaps the PDF is protected and the content can not be extracted?

i have an unverified suspicion that the tika shipped with solr 4.10.2 
may not support some/all office 2013 document formats.





On 4/14/2015 8:18 PM, Jack Krupansky wrote:

Try doing a manual extraction request directly to Solr (not via SolrJ) and
use the extractOnly option to see if the content is actually extracted.

See:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

Also, some PDF files actually have the content as a bitmap image, so no
text is extracted.


-- Jack Krupansky

On Tue, Apr 14, 2015 at 10:57 AM, Vijaya Narayana Reddy Bhoomi Reddy 
vijaya.bhoomire...@whishworks.com wrote:


Hi,

I am trying to index PDF and Microsoft Office files (.doc, .docx, .ppt,
.pptx, .xlx, and .xlx) files into Solr. I am facing the following issues.
Request to please let me know what is going wrong with the indexing
process.

I am using solr 4.10.2 and using the default example server configuration
that comes with Solr distribution.

PDF Files - Indexing as such works fine, but when I query using *.* in the
Solr Query console, metadata information is displayed properly. However,
the PDF content field is empty. This is happening for all PDF files I have
tried. I have tried with some proprietary files, PDF eBooks etc. Whatever
be the PDF file, content is not being displayed.

MS Office files -  For some office files, everything works perfect and the
extracted content is visible in the query console. However, for others, I
see the below error message during the indexing process.

*Exception in thread main
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser*


I am using SolrJ to index the documents and below is the code snippet
related to indexing. Please let me know where the issue is occurring.

 static String solrServerURL = 
http://localhost:8983/solr;;
static SolrServer solrServer = new HttpSolrServer(solrServerURL);
 static ContentStreamUpdateRequest indexingReq = new

 ContentStreamUpdateRequest(/update/extract);

 indexingReq.addFile(file, fileType);
indexingReq.setParam(literal.id, literalId);
indexingReq.setParam(uprefix, attr_);
indexingReq.setParam(fmap.content, content);
indexingReq.setParam(literal.fileurl, fileURL);
indexingReq.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
solrServer.request(indexingReq);

Thanks  Regards
Vijay

--
The contents of this e-mail are confidential and for the exclusive use of
the intended recipient. If you receive this e-mail in error please delete
it from your system immediately and notify us either by e-mail or
telephone. You should not copy, forward or otherwise disclose the content
of the e-mail. The views expressed in this communication may not
necessarily be the view held by WHISHWORKS.





Re: Java.net.socketexception: broken pipe Solr 4.10.2

2015-04-14 Thread jaime spicciati
We ran into this during our indexing process running on 4.10.3. After
increasing zookeeper timeouts, client timeouts, socket timeouts,
implementing retry logic on our loading process the thing that worked was
to change the Hard Commit timing. We were performing a Hard Commit every 5
minutes and after a couple hours of loading data some of the shards would
start going down because they would timeout with zookeeper and/or close
connections. Changing the timeouts just moved the problem later in the
ingest process.

Through a combination of decreasing the hard commit timing to 15 seconds,
and migrating to G1 garbage collect, we are able to prevent ingest
failures. For us the periodic stop the world garbage collects were causing
connections to be closed and other nasty things such as zookeeper timeouts
that would cause recovery to kick in. (Soft commits are turned off until
the full ingest/baseline completes). I believe until a Hard Commit is
issued Solr keeps the data in memory which explains why we were
experiencing nasty garbage collects.

The other change we made which may have helped is that we ensured the
socket timeouts were in sync between the jetty instance running Solr and
the SolrJ loading the data. During some of our batch updates Solr would
take a couple minutes to respond back which I believe in some instances the
socket server side would be closed (maxIdleTime setting in Jetty).

Hope this helps,
Jaime Spicciati

Thanks
Jaime


On Tue, Apr 14, 2015 at 9:26 AM, vsilgalis vsilga...@gmail.com wrote:

 Right now index size is about 10GB on each shard (yes I could use more
 RAM),
 but I'm looking more for a step up then step down approach.  I will try
 adding more RAM to these machines as my next step.

 1. Zookeeper is external to these boxes in a three node cluster with more
 than enough RAM to keep everything off disk.

 2. os disk cache, when I add more RAM I will just add it as RAM for the
 machine and not to the Java Heap unless that is something you recommend.

 3. java heap looks good so far, GC is minimal as far as i can tell but I
 can
 look into this some more.

 4. we do have 2 cores per machine, but the second core is a joke (10MB)

 note: zkClientTimeout is set to 30 for safety's sake.

 java settings:

 -XX:+CMSClassUnloadingEnabled-XX:+AggressiveOpts-XX:+ParallelRefProcEnabled-XX:+CMSParallelRemarkEnabled-XX:CMSMaxAbortablePrecleanTime=6000-XX:CMSTriggerPermRatio=80-XX:CMSInitiatingOccupancyFraction=50-XX:+UseCMSInitiatingOccupancyOnly-XX:CMSFullGCsBeforeCompaction=1-XX:PretenureSizeThreshold=64m-XX:+CMSScavengeBeforeRemark-XX:ParallelGCThreads=4-XX:ConcGCThreads=4-XX:+UseConcMarkSweepGC-XX:+UseParNewGC-XX:MaxTenuringThreshold=8-XX:TargetSurvivorRatio=90-XX:SurvivorRatio=4-XX:NewRatio=3-XX:-UseSuperWord-Xmx5588m-Xms1596m



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Java-net-socketexception-broken-pipe-Solr-4-10-2-tp4199484p4199561.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: [ANNOUNCE] Apache Solr 5.1.0 released

2015-04-14 Thread Anshum Gupta
Hi Joe,

This should help you:
http://lucene.apache.org/solr/5_1_0/changes/Changes.html#v5.1.0.upgrading_from_solr_5.0

On Tue, Apr 14, 2015 at 12:47 PM, Joseph Obernberger 
j...@lovehorsepower.com wrote:

 Great news!
 Any tips on how to do an upgrade from 5.0.0 to 5.1.0?
 Thank you!

 -Joe


 On 4/14/2015 2:39 PM, Timothy Potter wrote:

 I apologize - Yonik prepared these nice release notes for 5.1 and I
 neglected to include them:

 Solr 5.1 Release Highlights:

   * The new Facet Module, including the JSON Facet API.
 This module is currently marked as experimental to allow for
 further API feedback and improvements.

   * A new JSON request API.
 This feature is currently marked as experimental to allow for
 further API feedback and improvements.

   * The ability to upload and download Solr configurations via SolrJ
 (CloudSolrClient).

   * First-class support for Real-Time Get in SolrJ.

   * Spatial 2D heat-map faceting.

   * EnumField now has docValues support.

   * API to dynamically add Jars to Solr's classpath for plugins.

   * Ability to enable/disable individual stats in the StatsComponent.

   * lucene/solr query syntax to give any query clause a constant score.

   * Schema API enhancements to remove or replace fields, dynamic
 fields, field types and copy fields.

   * When posting XML or JSON to Solr with curl, there is no need to
 specify the content type.

   * A list of update processors to be used for an update can be
 specified dynamically for any given request.

   * StatsComponent now supports Percentiles.

   * facet.contains option to limit which constraints are returned.

   * Streaming Aggregation for SolrCloud.

   * The admin UI now visualizes Lucene segment information.

   * Parameter substitution / macro expansion across entire request


 On Tue, Apr 14, 2015 at 11:42 AM, Timothy Potter thelabd...@gmail.com
 wrote:

 14 April 2015 - The Lucene PMC is pleased to announce the release of
 Apache Solr 5.1.0.

 Solr 5.1.0 is available for immediate download at:
 http://www.apache.org/dyn/closer.cgi/lucene/solr/5.1.0

 Solr 5.1.0 includes 39 new features, 40 bug fixes, and 36 optimizations
 / other changes from over 60 unique contributors.

 For detailed information about what is included in 5.1.0 release,
 please see: http://lucene.apache.org/solr/5_1_0/changes/Changes.html

 Enjoy!





-- 
Anshum Gupta


Re: sort by a copy field error

2015-04-14 Thread Shawn Heisey
On 4/14/2015 11:32 AM, Pedro Figueiredo wrote:
 And when I try to sort by name_sort the following error is raised: 

 error: {

 msg: sort param field can't be found: name_sort,

 code: 400

   }

What was the exact sort parameter you sent to Solr?

Did you reload the core or restart Solr and then reindex after you
changed your schema?  A reindex will be required.

http://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Re: sort by a copy field error

2015-04-14 Thread Andrea Gazzarini
Hi Pedro
Please post the request that produces that error

Andrea
On 14 Apr 2015 19:33, Pedro Figueiredo pjlfigueir...@criticalsoftware.com
wrote:

 Hello,



 I have a pretty basic question:  how can I sort by a copyfield?



 My schema conf is:



 field name=name type=text_general_edge_ngram indexed=true
 stored=true omitNorms=true termVectors=true/

 field name=name_sort type=string indexed=true stored=false/

 copyField source=name dest=name_sort /



 And when I try to sort by name_sort the following error is raised:

 error: {

 msg: sort param field can't be found: name_sort,

 code: 400

   }



 Thanks in advanced,



 Pedro Figueiredo






Re: proper routing (from non-Java client) in solr cloud 5.0.0

2015-04-14 Thread Hrishikesh Gadre
Hi Ian,

As per my understanding, Solrj does not use Zookeeper watches but instead
caches the information (along with a TTL). You can find more information
here,

https://issues.apache.org/jira/browse/SOLR-5473
https://issues.apache.org/jira/browse/SOLR-5474

Regards
Hrishikesh


On Tue, Apr 14, 2015 at 8:49 AM, Ian Rose ianr...@fullstory.com wrote:

 Hi all -

 I've just upgraded my dev install of Solr (cloud) from 4.10 to 5.0.  Our
 client is written in Go, for which I am not aware of a client, so we wrote
 our own.  One tricky bit for this was the routing logic; if a document has
 routing prefix X and belong to collection Y, we need to know which solr
 node to connect to.  Previously we accomplished this by watching the
 clusterstate.json
 file in zookeeper - at startup and whenever it changes, the client parses
 the file contents to build a routing table.

 However in 5.0 newly create collections do not show up in clusterstate.json
 but instead have their own state.json document.  Are there any
 recommendations for how to handle this from the client?  The obvious answer
 is to watch every collection's state.json document, but we run a lot of
 collections (~1000 currently, and growing) so I'm concerned about keeping
 that many watches open at the same time (should I be?).  How does the SolrJ
 client handle this?

 Thanks!
 - Ian



Re: Disable or limit the size of Lucene field cache

2015-04-14 Thread pras.venkatesh
Thank you.. This really helps. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-or-limit-the-size-of-Lucene-field-cache-tp4198798p4199646.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: proper routing (from non-Java client) in solr cloud 5.0.0

2015-04-14 Thread Ian Rose
Hi Hrishikesh,

Thanks for the pointers - I had not looked at SOLR-5474
https://issues.apache.org/jira/browse/SOLR-5474 previously.  Interesting
approach...  I think we will stick with trying to keep zk watches open from
all clients to all collections for now, but if that starts to be a
bottleneck its good to know how the route that Solrj has chosen...

cheers,
Ian



On Tue, Apr 14, 2015 at 3:56 PM, Hrishikesh Gadre gadre.s...@gmail.com
wrote:

 Hi Ian,

 As per my understanding, Solrj does not use Zookeeper watches but instead
 caches the information (along with a TTL). You can find more information
 here,

 https://issues.apache.org/jira/browse/SOLR-5473
 https://issues.apache.org/jira/browse/SOLR-5474

 Regards
 Hrishikesh


 On Tue, Apr 14, 2015 at 8:49 AM, Ian Rose ianr...@fullstory.com wrote:

  Hi all -
 
  I've just upgraded my dev install of Solr (cloud) from 4.10 to 5.0.  Our
  client is written in Go, for which I am not aware of a client, so we
 wrote
  our own.  One tricky bit for this was the routing logic; if a document
 has
  routing prefix X and belong to collection Y, we need to know which solr
  node to connect to.  Previously we accomplished this by watching the
  clusterstate.json
  file in zookeeper - at startup and whenever it changes, the client parses
  the file contents to build a routing table.
 
  However in 5.0 newly create collections do not show up in
 clusterstate.json
  but instead have their own state.json document.  Are there any
  recommendations for how to handle this from the client?  The obvious
 answer
  is to watch every collection's state.json document, but we run a lot of
  collections (~1000 currently, and growing) so I'm concerned about keeping
  that many watches open at the same time (should I be?).  How does the
 SolrJ
  client handle this?
 
  Thanks!
  - Ian
 



Re: [ANNOUNCE] Apache Solr 5.1.0 released

2015-04-14 Thread Joseph Obernberger

Great news!
Any tips on how to do an upgrade from 5.0.0 to 5.1.0?
Thank you!

-Joe

On 4/14/2015 2:39 PM, Timothy Potter wrote:

I apologize - Yonik prepared these nice release notes for 5.1 and I
neglected to include them:

Solr 5.1 Release Highlights:

  * The new Facet Module, including the JSON Facet API.
This module is currently marked as experimental to allow for
further API feedback and improvements.

  * A new JSON request API.
This feature is currently marked as experimental to allow for
further API feedback and improvements.

  * The ability to upload and download Solr configurations via SolrJ
(CloudSolrClient).

  * First-class support for Real-Time Get in SolrJ.

  * Spatial 2D heat-map faceting.

  * EnumField now has docValues support.

  * API to dynamically add Jars to Solr's classpath for plugins.

  * Ability to enable/disable individual stats in the StatsComponent.

  * lucene/solr query syntax to give any query clause a constant score.

  * Schema API enhancements to remove or replace fields, dynamic
fields, field types and copy fields.

  * When posting XML or JSON to Solr with curl, there is no need to
specify the content type.

  * A list of update processors to be used for an update can be
specified dynamically for any given request.

  * StatsComponent now supports Percentiles.

  * facet.contains option to limit which constraints are returned.

  * Streaming Aggregation for SolrCloud.

  * The admin UI now visualizes Lucene segment information.

  * Parameter substitution / macro expansion across entire request


On Tue, Apr 14, 2015 at 11:42 AM, Timothy Potter thelabd...@gmail.com wrote:

14 April 2015 - The Lucene PMC is pleased to announce the release of
Apache Solr 5.1.0.

Solr 5.1.0 is available for immediate download at:
http://www.apache.org/dyn/closer.cgi/lucene/solr/5.1.0

Solr 5.1.0 includes 39 new features, 40 bug fixes, and 36 optimizations
/ other changes from over 60 unique contributors.

For detailed information about what is included in 5.1.0 release,
please see: http://lucene.apache.org/solr/5_1_0/changes/Changes.html

Enjoy!




JSON Facet Analytics API in Solr 5.1

2015-04-14 Thread Yonik Seeley
Folks, there's a new JSON Facet API in the just released Solr 5.1
(actually, a new facet module under the covers too).

It's marked as experimental so we have time to change the API based on
your feedback.  So let us know what you like, what you would change,
what's missing, or any other ideas you may have!

I've just started the documentation for the reference guide (on our
confluence wiki), so for now the best doc is on my blog:

http://yonik.com/json-facet-api/
http://yonik.com/solr-facet-functions/
http://yonik.com/solr-subfacets/

I'll also be hanging out more on the #solr-dev IRC channel on freenode
if you want to hit me up there about any development ideas.

-Yonik


using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

2015-04-14 Thread elisabeth benoit
Hello,

I am using Solr 4.10.1 and trying to use DirectSolrSpellChecker and
FileBasedSpellchecker in same request.

I've applied change from patch 135.patch (cf Solr-6271). I've tried running
the command patch -p1 -i 135.patch --dry-run but it didn't work, maybe
because the patch was a fix to Solr 4.9, so I just replaced line in
ConjunctionSolrSpellChecker

else if (!stringDistance.equals(checker.getStringDistance())) {
 throw new IllegalArgumentException(
 All checkers need to use the same StringDistance.);
   }


by

else if (!stringDistance.equals(checker.getStringDistance())) {
throw new IllegalArgumentException(
All checkers need to use the same StringDistance!!! 1: +
checker.getStringDistance() +  2:  + stringDistance);
  }

as it was done in the patch

but still, when I send a spellcheck request, I get the error

msg: All checkers need to use the same StringDistance!!!
1:org.apache.lucene.search.spell.LuceneLevenshteinDistance@15f57db32:
org.apache.lucene.search.spell.LuceneLevenshteinDistance@280f7e08

From error message I gather both spellchecker use same distanceMeasure
LuceneLevenshteinDistance, but they're not same instance of
LuceneLevenshteinDistance.

Is the condition all right? What should be done to fix this properly?

Thanks,
Elisabeth


Re: [ANNOUNCE] Apache Solr 5.1.0 released

2015-04-14 Thread Timothy Potter
I apologize - Yonik prepared these nice release notes for 5.1 and I
neglected to include them:

Solr 5.1 Release Highlights:

 * The new Facet Module, including the JSON Facet API.
   This module is currently marked as experimental to allow for
further API feedback and improvements.

 * A new JSON request API.
   This feature is currently marked as experimental to allow for
further API feedback and improvements.

 * The ability to upload and download Solr configurations via SolrJ
(CloudSolrClient).

 * First-class support for Real-Time Get in SolrJ.

 * Spatial 2D heat-map faceting.

 * EnumField now has docValues support.

 * API to dynamically add Jars to Solr's classpath for plugins.

 * Ability to enable/disable individual stats in the StatsComponent.

 * lucene/solr query syntax to give any query clause a constant score.

 * Schema API enhancements to remove or replace fields, dynamic
fields, field types and copy fields.

 * When posting XML or JSON to Solr with curl, there is no need to
specify the content type.

 * A list of update processors to be used for an update can be
specified dynamically for any given request.

 * StatsComponent now supports Percentiles.

 * facet.contains option to limit which constraints are returned.

 * Streaming Aggregation for SolrCloud.

 * The admin UI now visualizes Lucene segment information.

 * Parameter substitution / macro expansion across entire request


On Tue, Apr 14, 2015 at 11:42 AM, Timothy Potter thelabd...@gmail.com wrote:
 14 April 2015 - The Lucene PMC is pleased to announce the release of
 Apache Solr 5.1.0.

 Solr 5.1.0 is available for immediate download at:
 http://www.apache.org/dyn/closer.cgi/lucene/solr/5.1.0

 Solr 5.1.0 includes 39 new features, 40 bug fixes, and 36 optimizations
 / other changes from over 60 unique contributors.

 For detailed information about what is included in 5.1.0 release,
 please see: http://lucene.apache.org/solr/5_1_0/changes/Changes.html

 Enjoy!