date:20121220

Re: Solr4.0 causes NoClassDefFoundError while indexing class files and mp4 files.

2012-12-20 Thread Shigeki Kobayashi

Thanks Abe-san!

Your advice is very informative.

Thanks again.


Regards,

Shigeki


2012/12/21 Shinichiro Abe 

> You can place the missing JAR files in the contrib/extraction/lib.
>
> For class files: asm-x.x.jar
> For mp4 files: aspectjrt-x.x.jar
>
> FWIW, please see https://issues.apache.org/jira/browse/SOLR-4209
>
> Regards,
> Shinichiro Abe
>
> On 2012/12/21, at 15:08, Shigeki Kobayashi wrote:
>
> > Hi,
> >
> > I use ManifoldCF1.1dev to crawl files and index them into Solr4.0
> >
> > While indexing class files and mp4 files, Solr caused
> NoClassDefFoundError
> > as
> > following:
> >
> >>> Indexing a mp4 file
> >
> > 2012-12-19
> >
> 06:16:48,485%P[solr.servlet.SolrDispatchFilter]-[TP-Processor44]-:null:java.lang.RuntimeException:
> > java.lang.NoClassDefFoundError: org/aspectj/lang/Signature
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >at
> >
> filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> >at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> >at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> >at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> >at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> >at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> >at
> > org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
> >at
> > org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
> >at
> org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:774)
> >at
> >
> org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:703)
> >at
> >
> org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:896)
> >at
> >
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
> >at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.lang.NoClassDefFoundError: org/aspectj/lang/Signature
> >at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:117)
> >at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >at
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> >at
> >
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
> >at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
> >... 18 more
> > Caused by: java.lang.ClassNotFoundException: org.aspectj.lang.Signature
> >at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >at java.security.AccessController.doPrivileged(Native Method)
> >at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >at
> java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
> >at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >... 29 more
> >
> > --
> >>> Indexing a class file
> >
> > 2012-12-19
> >
> 08:10:58,327%P[solr.servlet.SolrDispatchFilter]-[TP-Processor3]-:null:java.lang.RuntimeException:
> > java.lang.NoClassDefFoundError: org/objectweb/asm/ClassVisitor
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
> >

Solr 3.5: java.lang.NegativeArraySizeException caused by negative start value

2012-12-20 Thread Shawn Heisey


This is on Solr 3.5.0.

We are getting a java.lang.NegativeArraySizeException when our webapp 
sends a query where the start parameter is set to a negative value. 
This seems to set off a denial of service problem within Solr.  I don't 
yet know whether it's a mistake in coding, or whether some malicious 
user has found an attack vector on our site.


After the first exception, another exception 
(org.mortbay.jetty.EofException) appears in the logs with increasing 
frequency.  Within minutes of the first exception, the load balancer 
complains about having no servers available because ping requests are 
failing.


This is distributed search, but the shards parameter is in 
solrconfig.xml, not provided by the client.


Full exception:

Dec 20, 2012 7:41:34 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NegativeArraySizeException
at 
org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java:108)
at 
org.apache.solr.handler.component.ShardFieldSortedHitQueue.(ShardDoc.java:139)
at 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:712)
at 
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:571)
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:550)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)



Later exceptions:

Dec 21, 2012 12:24:37 AM org.apache.solr.common.SolrException log
SEVERE: org.mortbay.jetty.EofException
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)
at 
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)
at 
org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)

at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at 
org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.Ht

Which token filter can combine 2 terms into 1?

2012-12-20 Thread Xi Shen

Hi,

I am looking for a token filter that can combine 2 terms into 1? E.g.

the input has been tokenized by white space:

t1 t2 t2a t3

I want a filter that output:

t1 t2t2a t3

I know it is a very special case, and I am thinking about develop a filter
of my own. But I cannot figure out which API I should use to look for terms
in a Token Stream.


-- 
Regards，
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Re: Solr4.0 causes NoClassDefFoundError while indexing class files and mp4 files.

2012-12-20 Thread Shinichiro Abe

You can place the missing JAR files in the contrib/extraction/lib.

For class files: asm-x.x.jar
For mp4 files: aspectjrt-x.x.jar

FWIW, please see https://issues.apache.org/jira/browse/SOLR-4209

Regards,
Shinichiro Abe

On 2012/12/21, at 15:08, Shigeki Kobayashi wrote:

> Hi,
> 
> I use ManifoldCF1.1dev to crawl files and index them into Solr4.0
> 
> While indexing class files and mp4 files, Solr caused NoClassDefFoundError
> as
> following:
> 
>>> Indexing a mp4 file
> 
> 2012-12-19
> 06:16:48,485%P[solr.servlet.SolrDispatchFilter]-[TP-Processor44]-:null:java.lang.RuntimeException:
> java.lang.NoClassDefFoundError: org/aspectj/lang/Signature
>at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
>at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>at
> filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
>at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>at
> org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
>at
> org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
>at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:774)
>at
> org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:703)
>at
> org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:896)
>at
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
>at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.NoClassDefFoundError: org/aspectj/lang/Signature
>at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:117)
>at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
>at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
>at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
>... 18 more
> Caused by: java.lang.ClassNotFoundException: org.aspectj.lang.Signature
>at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>at java.security.AccessController.doPrivileged(Native Method)
>at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>... 29 more
> 
> --
>>> Indexing a class file
> 
> 2012-12-19
> 08:10:58,327%P[solr.servlet.SolrDispatchFilter]-[TP-Processor3]-:null:java.lang.RuntimeException:
> java.lang.NoClassDefFoundError: org/objectweb/asm/ClassVisitor
>at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
>at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>at
> filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
>at
> org.apache.catalina.core.ApplicationFilterChain.inter

Reply:How to add the extra analyzer jar?

2012-12-20 Thread SuoNayi

The issue has been solved and sorry for my negligence.




At 2012-12-21 11:10:53,SuoNayi  wrote:
>Hi all, for solrcloud(solr 4.0) how to add the third analyzer?
>There is a third analyzer jar and I want to integrate it with solrcloud.
>Here are my steps but the ClassNotFoundException is thrown at last when 
>startup.
>1.add the fieldType in the schema.xml and here is a snippet :
>
>
>
>
>isMaxWordLength="false"/>  
>words="stopwords.txt"/>  
>generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
>splitOnCaseChange="1"/>  
>  
>protected="protwords.txt"/>  
>
>
>  
>isMaxWordLength="false"/>  
>words="stopwords.txt"/>  
>generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
>splitOnCaseChange="1"/>
>  
>protected="protwords.txt"/>  
>
>  
>
>2. add the IKAnalyzer.cfg.xml and stopword.dic files into the classes 
>directory of the solr.war(open the war and add those two files).
>3.use the start.jar to start up and the ClassNotFoundException is thrown.
>
>
>Could some help me to figure out what's wrong or tell me where I can add the 
>extra/third jar lib into the classpath of the solrcloud?
>
>
>Thanks,
>
>
>SuoNayi

Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Alexandre Rafalovitch

I agree actually (about not surprising the users). But the consequences of
forgetting this value may also lead to some serious debugging issues.

An interesting (not sure if reasonable) compromise would be to look at an
error message for @version=1 and using @multiValued attribute and make sure
it actually complains if it sees such combination and that the message
explicitly say "What's your @version value? Maybe it needs to be
explicit/more recent". Same with autoGeneratePhraseQueries and @version=1.4.

Then, somebody patching together a config file from multiple sources will
be guided in the right direction.

Just a newbie-oriented thought. I am sure, there are other more-processing
things on a pipeline.

Regards,
 Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

On Fri, Dec 21, 2012 at 4:49 PM, Chris Hostetter
wrote:

>
> : On another hand, having @version default to 1.0 is probably an oversight,
> : given the number of changes present Should it not default to latest
> or
> : at least to 1.5 (and change periodically)?
>
> If the default value changed, then users w/o a version attribute in their
> schema would suddenly get very different behavior if they upgraded from
> one version of solr the the next.
>
>
> -Hoss
>

Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Chris Hostetter


: On another hand, having @version default to 1.0 is probably an oversight,
: given the number of changes present Should it not default to latest or
: at least to 1.5 (and change periodically)?

If the default value changed, then users w/o a version attribute in their 
schema would suddenly get very different behavior if they upgraded from 
one version of solr the the next.


-Hoss

Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Alexandre Rafalovitch

Thank you.

So, the conclusion to me is that @name can be skipped. It is not used in
anything (or anything critical anyway) and there is a default. That's good
enough for me.

On another hand, having @version default to 1.0 is probably an oversight,
given the number of changes present Should it not default to latest or
at least to 1.5 (and change periodically)?

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Dec 21, 2012 at 7:50 AM, Jack Krupansky wrote:

> Yeah... not sure how I missed it, but my search sees it now.
>
> Also, the name will default to "schema.xml" is you do leave it out of the
> schema.
>
> -- Jack Krupansky
>
> -Original Message- From: Mikhail Khludnev
> Sent: Thursday, December 20, 2012 3:06 PM
> To: solr-user
> Subject: Re: Where does schema.xml's schema/@name displays?
>
>
> Jack,
> FWIW I've found occurrence in SystemInfoHandler.java
>
>
> On Thu, Dec 20, 2012 at 6:32 PM, Jack Krupansky *
> *wrote:
>
>  I checked the 4.x source code and except for the fact that you will get a
>> warning if you leave it out, nothing uses that name. But... that's not to
>> say that a future release might not require it - the doc/comments don't
>> explicitly say that it is optional.
>>
>> Note that the version attribute is optional (as per the source code, but
>> no mention in doc/comments) and defaults to 1.0, with no warning.
>>
>> -- Jack Krupansky
>>
>>
>> -Original Message- From: Alexandre Rafalovitch
>> Sent: Thursday, December 20, 2012 12:08 AM
>> To: solr-user@lucene.apache.org
>> Subject: Where does schema.xml's schema/@name displays?
>>
>> Hello,
>>
>> In the schema.xml, we have a name attribute on the root note. The
>> documentation says it is for display purpose only. But for display where?
>>
>> It seems that the admin console uses the name in the solr.xml file
>> instead.
>> And deleting the name attribute does not seem to cause any problems
>> either.
>>
>> The reason I ask is because I am writing an explanation example which
>> involves schema.xml config file being copied and modified over and over
>> again. If @name is significant, I need to mention changing it. If not, I
>> will just delete it all together.
>>
>> Regards,
>>   Alex.
>>
>>
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: 
>> http://www.linkedin.com/in/alexandrerafalovitch
>> 
>> >
>> - Time is the quality of nature that keeps events from happening all at
>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>

Re: Invalid version (expected 2, but 60) or the data in not in 'javabin'

2012-12-20 Thread Otis Gospodnetic

Hi,

Have a look at http://search-lucene.com/?q=invalid+version+javabin

Otis
--
Solr Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Wed, Dec 19, 2012 at 11:23 AM, Shahar Davidson wrote:

> Hi,
>
> I'm encountering this error randomly when running a distributed facet.
>  (i.e. I'm sending the exact same request, yet this does not reproduce
> consistently)
> I have about  180 shards that are being queried.
> It seems that when Solr distributes the request to the shards one , or
> perhaps more, shards return an  XML reply instead of  Javabin.
>
> I added some debug output to JavaBinCode.unmarshal  (as done in the
> debugging.patch of SOLR-3258) to check whether the XML reply holds an error
> or not, and I noticed that the XML actually holds the response from one of
> the shards.
>
> I'm using the patch provided in SOLR-2894 on top of trunk 1404975.
>
> Has anyone encountered such an issue? Any ideas?
>
> Thanks,
>
> Shahar.
>

Re: jconsole over jmx - should threads be visible?

2012-12-20 Thread Shawn Heisey


On 12/20/2012 4:57 PM, Chris Hostetter wrote:

i just tried running the 4x solr example with the jetty options to allow
remote JMX...

java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false  
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.port=1099 -jar start.jar

...and was then able to monitor using jconsole and see all of the thread
info as well fro ma remote machine.


I already had -Dcom.sun.management.jmxremote.authenticate=true.  No 
threads information was availble.  When I changed that to false and 
restarted jetty, I could suddenly see threads.  Therefore I think that 
read-write permission must be required.  I wonder if Oracle could be 
convinced that this is a bug.


Thanks for the help!

Shawn

RE: occasional GC crashes

2012-12-20 Thread Otis Gospodnetic

Hi Robi,

Oh that's the thing of the past, go for the latest Java 7 if they let you!

Otis
--
Performance Monitoring - http://sematext.com/spm
On Dec 20, 2012 6:29 PM, "Petersen, Robert"  wrote:

> Hi Otis,
>
> I thought Java 7 had a bug which wasn't being addressed by Oracle which
> was making it not suitable for Solr.  Did that get fixed now?
> http://searchhub.org/2011/07/28/dont-use-java-7-for-anything/
>
> I did see this but it doesn't really mention the bug:
> http://opensearchnews.com/2012/04/announcing-java7-support-with-apache-solr-and-lucene/
>
> Thanks
> Robi
>
>
> -Original Message-
> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
> Sent: Tuesday, December 18, 2012 5:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: occasional GC crashes
>
> Robert,
>
> Step 1 is to get the latest Java 7 or if you have to remain on 6 then use
> the latest 6.
>
> Otis
> --
> SOLR Performance Monitoring - http://sematext.com/spm On Dec 18, 2012
> 7:54 PM, "Petersen, Robert"  wrote:
>
> >  Hi solr user group,
> >
> > ** **
> >
> > Sorry if this isn't directly a Solr question.  Seems like once in a
> > blue moon the GC crashes on a server in our Solr 3.6.1 slave farm.
> > This seems to only happen on a couple of the twelve slaves we have
> > deployed and only very rarely on those.  It seems like this doesn't
> > directly affect solr because in the logs it looks like solr keeps
> > working after the time of the exception but our external monitoring
> > tool reports that the solr service is down so our operations department
> restarts solr on that box and alerts me.
> > The solr logs show nothing unusual.  The exception does show up in the
> > catalina.out log file though.  Does this happen to anyone else?  Here is
> > the basic error and I have attached the crash dump file also.   Our total
> > uptime on these boxes is over a year now BTW.
> >
> > ** **
> >
> > #
> >
> > # A fatal error has been detected by the Java Runtime Environment:
> >
> > #
> >
> > #  SIGSEGV (0xb) at pc=0x2b5379346612, pid=13724,
> > tid=1082353984
> >
> > #
> >
> > # JRE version: 6.0_25-b06
> >
> > # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.0-b11 mixed mode
> > linux-amd64 )
> >
> > # Problematic frame:
> >
> > # V  [libjvm.so+0x3c4612]  Par_ConcMarkingClosure::trim_queue(unsigned
> > long)+0x82
> >
> > #
> >
> > # An error report file with more information is saved as:
> >
> > # /var/LucidWorks/lucidworks/hs_err_pid13724.log
> >
> > #
> >
> > # If you would like to submit a bug report, please visit:
> >
> > #   http://java.sun.com/webapps/bugreport/crash.jsp
> >
> > #
> >
> > ** **
> >
> > VM Arguments:
> >
> > jvm_args:
> > -Djava.util.logging.config.file=/var/LucidWorks/lucidworks/tomcat/conf
> > /logging.properties
> > -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> > -Xmx32768m -Xms32768m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
> > -Dcom.sun.management.jmxremote
> > -Dcom.sun.management.jmxremote.ssl=false
> > -Dcom.sun.management.jmxremote.authenticate=false
> > -Dcom.sun.management.jmxremote.port=6060
> > -Djava.endorsed.dirs=/var/LucidWorks/lucidworks/tomcat/endorsed
> > -Dcatalina.base=/var/LucidWorks/lucidworks/tomcat
> > -Dcatalina.home=/var/LucidWorks/lucidworks/tomcat
> > -Djava.io.tmpdir=/var/LucidWorks/lucidworks/tomcat/temp 
> >
> > java_command: org.apache.catalina.startup.Bootstrap -server
> > -Dsolr.solr.home=lucidworks/solr start
> >
> > Launcher Type: SUN_STANDARD
> >
> > ** **
> >
> > Stack: [0x,0x],
> > sp=0x40835eb0, free space=1056983k
> >
> > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
> > C=native
> > code)
> >
> > V  [libjvm.so+0x3c4612]  Par_ConcMarkingClosure::trim_queue(unsigned
> > long)+0x82
> >
> > V  [libjvm.so+0x3c481a]
> > CMSConcMarkingTask::do_work_steal(int)+0xfa
> >
> > V  [libjvm.so+0x3c3dcf]  CMSConcMarkingTask::work(int)+0xef
> >
> > V  [libjvm.so+0x8783dc]  YieldingFlexibleGangWorker::loop()+0xbc
> >
> > V  [libjvm.so+0x8755b4]  GangWorker::run()+0x24
> >
> > V  [libjvm.so+0x71096f]  java_start(Thread*)+0x13f
> >
> > ** **
> >
> > Heap
> >
> > par new generation   total 345024K, used 180672K [0x2e12,
> > 0x2aaac578, 0x2aaac578)
> >
> >   eden space 306688K,  53% used [0x2e12,
> > 0x2aaab8243c28,
> > 0x2aaac0ca)
> >
> >   from space 38336K,  40% used [0x2aaac321,
> > 0x2aaac415c3f8,
> > 0x2aaac578)
> >
> >   to   space 38336K,   0% used [0x2aaac0ca, 0x2aaac0ca,
> > 0x2aaac321)
> >
> > concurrent mark-sweep generation total 33171072K, used 12144213K
> > [0x2aaac578, 0x2ab2ae12, 0x2ab2ae12)
> >
> > concurrent-mark-sweep perm gen total 83968K, used 50650K
> > [0x2ab2ae12, 0x2ab2b332, 0x2ab2b332)**

Re: Store document while using Solr

2012-12-20 Thread Otis Gospodnetic

Hi,

You can use Solr's DataImportHandler to index files in the file system.
 You could set things up in such a way that Solr keeps indexing whatever
you put in some specific location in the FS.  This is not the most common
setup, but it's certainly possible.  Solr keeps the searchable index in its
own directory defined in in one of its configs.

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html

On Thu, Dec 20, 2012 at 8:15 PM, Nicholas Li  wrote:

> hi there,
>
> I am quite new to Solr and have a very basic question about storing and
> indexing the document.
>
> I am trying with the Solr example, and when I run command like 'java -jar
> post.jar foo/test.xml', it gives me the feeling that solr will index the
> given file, no matter where it is store, and solr won't re-store this file
> to some other location in the file system.  Am I correct?
>
> If I want use file system to manage the document, it seem like it is better
> to define some location, which will be used to store all the potential
> files(It may need some processing to move/copy/upload the files to this
> location), then use solr to index them under this location. Am I correct?
>
> Cheers,
> Nick
>

Store document while using Solr

2012-12-20 Thread Nicholas Li

hi there,

I am quite new to Solr and have a very basic question about storing and
indexing the document.

I am trying with the Solr example, and when I run command like 'java -jar
post.jar foo/test.xml', it gives me the feeling that solr will index the
given file, no matter where it is store, and solr won't re-store this file
to some other location in the file system.  Am I correct?

If I want use file system to manage the document, it seem like it is better
to define some location, which will be used to store all the potential
files(It may need some processing to move/copy/upload the files to this
location), then use solr to index them under this location. Am I correct?

Cheers,
Nick

Re: jconsole over jmx - should threads be visible?

2012-12-20 Thread Chris Hostetter


: If I connect jconsole to a remote Solr installation (or any app) using jmx,
: all the graphs are populated except 'threads' ... is this expected, or have I
: done something wrong?  I can't seem to locate the answer with google.

i just tried running the 4x solr example with the jetty options to allow 
remote JMX...

java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false  
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.port=1099 -jar start.jar 

...and was then able to monitor using jconsole and see all of the thread 
info as well fro ma remote machine.

-Hoss

Re: Japanese exact match results do not show on top of results

2012-12-20 Thread Robert Muir

I think you are hitting solr-3589. There is a vote underway for a 3.6.2
that contains this fix
On Dec 20, 2012 6:29 PM, "kirpakaro"  wrote:

> Hi folks,
>
> I am having couple of problems with Japanese data, 1. it is not
> properly
> indexing all the data 2. displaying the exact match result on top and then
> 90%match and 80%match etc. does not work.
>  I am using solr3.6.1 and using text_ja as the fieldType here is the schema
>
>
>
> multiValued="true"/>
>
>
>  
>
> what I want to achieve is that if there is an exact query match it should
> provide the results from q_e followed by results from partial match from q
> field and if there is nothing in q_e field then partial matches should come
> from q field.  This is how I specify the query
>
> http://localhost:7983/zoom/jp/select/?q=鹿児島
> 鹿児島銀行&rows=10&version=2.2&qf=query+query_exact^1&mm=90%25&pf=q^1+q_e^10
> OR
> &version=2.2&rows=10&qf=q+q_e^1&pf=query^10+query_exact^1
>
> somehow the exact query matches results do not come on top, though the data
> contains it. It is puzzling that all the documents do not get indexed
> properly, but if I change the q field to string and q_e to text_ja then all
> the records are indexed properly, but that still does not solve the problem
> of exact match on top followed by partial matches.
>
> text_ja field uses:
> 
>  tags="../../../solr/conf/lang/stoptags_ja.txt"
> enablePositionIncrements="true"/>
>   
>  words="../../../solr/conf/lang/stopwords_ja.txt"
> enablePositionIncrements="true" />
>  
>   
>
>  How to solve this problem,
>
> Thanks
>
>
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Japanese-exact-match-results-do-not-show-on-top-of-results-tp4028422.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Japanese exact match results do not show on top of results

2012-12-20 Thread kirpakaro

Hi folks,

I am having couple of problems with Japanese data, 1. it is not properly
indexing all the data 2. displaying the exact match result on top and then
90%match and 80%match etc. does not work.
 I am using solr3.6.1 and using text_ja as the fieldType here is the schema 


   
   
   

 

what I want to achieve is that if there is an exact query match it should
provide the results from q_e followed by results from partial match from q
field and if there is nothing in q_e field then partial matches should come
from q field.  This is how I specify the query

http://localhost:7983/zoom/jp/select/?q=鹿児島
鹿児島銀行&rows=10&version=2.2&qf=query+query_exact^1&mm=90%25&pf=q^1+q_e^10
OR
&version=2.2&rows=10&qf=q+q_e^1&pf=query^10+query_exact^1

somehow the exact query matches results do not come on top, though the data
contains it. It is puzzling that all the documents do not get indexed
properly, but if I change the q field to string and q_e to text_ja then all
the records are indexed properly, but that still does not solve the problem
of exact match on top followed by partial matches.

text_ja field uses:


  

 
  

 How to solve this problem, 

Thanks










--
View this message in context: 
http://lucene.472066.n3.nabble.com/Japanese-exact-match-results-do-not-show-on-top-of-results-tp4028422.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: occasional GC crashes

2012-12-20 Thread Petersen, Robert

Hi Otis,

I thought Java 7 had a bug which wasn't being addressed by Oracle which was 
making it not suitable for Solr.  Did that get fixed now?
http://searchhub.org/2011/07/28/dont-use-java-7-for-anything/

I did see this but it doesn't really mention the bug:  
http://opensearchnews.com/2012/04/announcing-java7-support-with-apache-solr-and-lucene/

Thanks
Robi


-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Tuesday, December 18, 2012 5:25 PM
To: solr-user@lucene.apache.org
Subject: Re: occasional GC crashes

Robert,

Step 1 is to get the latest Java 7 or if you have to remain on 6 then use the 
latest 6.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm On Dec 18, 2012 7:54 PM, 
"Petersen, Robert"  wrote:

>  Hi solr user group,
>
> ** **
>
> Sorry if this isn't directly a Solr question.  Seems like once in a 
> blue moon the GC crashes on a server in our Solr 3.6.1 slave farm.  
> This seems to only happen on a couple of the twelve slaves we have 
> deployed and only very rarely on those.  It seems like this doesn't 
> directly affect solr because in the logs it looks like solr keeps 
> working after the time of the exception but our external monitoring 
> tool reports that the solr service is down so our operations department 
> restarts solr on that box and alerts me.
> The solr logs show nothing unusual.  The exception does show up in the 
> catalina.out log file though.  Does this happen to anyone else?  Here is
> the basic error and I have attached the crash dump file also.   Our total
> uptime on these boxes is over a year now BTW.
>
> ** **
>
> #
>
> # A fatal error has been detected by the Java Runtime Environment:
>
> #
>
> #  SIGSEGV (0xb) at pc=0x2b5379346612, pid=13724, 
> tid=1082353984
>
> #
>
> # JRE version: 6.0_25-b06
>
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.0-b11 mixed mode
> linux-amd64 )
>
> # Problematic frame:
>
> # V  [libjvm.so+0x3c4612]  Par_ConcMarkingClosure::trim_queue(unsigned
> long)+0x82
>
> #
>
> # An error report file with more information is saved as:
>
> # /var/LucidWorks/lucidworks/hs_err_pid13724.log
>
> #
>
> # If you would like to submit a bug report, please visit:
>
> #   http://java.sun.com/webapps/bugreport/crash.jsp
>
> #
>
> ** **
>
> VM Arguments:
>
> jvm_args:
> -Djava.util.logging.config.file=/var/LucidWorks/lucidworks/tomcat/conf
> /logging.properties 
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> -Xmx32768m -Xms32768m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
> -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.port=6060
> -Djava.endorsed.dirs=/var/LucidWorks/lucidworks/tomcat/endorsed
> -Dcatalina.base=/var/LucidWorks/lucidworks/tomcat
> -Dcatalina.home=/var/LucidWorks/lucidworks/tomcat
> -Djava.io.tmpdir=/var/LucidWorks/lucidworks/tomcat/temp 
>
> java_command: org.apache.catalina.startup.Bootstrap -server 
> -Dsolr.solr.home=lucidworks/solr start
>
> Launcher Type: SUN_STANDARD
>
> ** **
>
> Stack: [0x,0x],  
> sp=0x40835eb0, free space=1056983k
>
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, 
> C=native
> code)
>
> V  [libjvm.so+0x3c4612]  Par_ConcMarkingClosure::trim_queue(unsigned
> long)+0x82
>
> V  [libjvm.so+0x3c481a]  
> CMSConcMarkingTask::do_work_steal(int)+0xfa
>
> V  [libjvm.so+0x3c3dcf]  CMSConcMarkingTask::work(int)+0xef
>
> V  [libjvm.so+0x8783dc]  YieldingFlexibleGangWorker::loop()+0xbc
>
> V  [libjvm.so+0x8755b4]  GangWorker::run()+0x24
>
> V  [libjvm.so+0x71096f]  java_start(Thread*)+0x13f
>
> ** **
>
> Heap
>
> par new generation   total 345024K, used 180672K [0x2e12,
> 0x2aaac578, 0x2aaac578)
>
>   eden space 306688K,  53% used [0x2e12, 
> 0x2aaab8243c28,
> 0x2aaac0ca)
>
>   from space 38336K,  40% used [0x2aaac321, 
> 0x2aaac415c3f8,
> 0x2aaac578)
>
>   to   space 38336K,   0% used [0x2aaac0ca, 0x2aaac0ca,
> 0x2aaac321)
>
> concurrent mark-sweep generation total 33171072K, used 12144213K 
> [0x2aaac578, 0x2ab2ae12, 0x2ab2ae12)
>
> concurrent-mark-sweep perm gen total 83968K, used 50650K 
> [0x2ab2ae12, 0x2ab2b332, 0x2ab2b332)
>
> ** **
>
> Code Cache  [0x2b054000, 0x2b9a4000, 
> 0x2e054000)**
> **
>
> total_blobs=2800 nmethods=2273 adapters=480 free_code_cache=40752512
> largest_free_block=15808
>
> ** **
>
> ** **
>
> ** **
>
> Thanks,
>
> ** **
>
> *Robert (Robi) Petersen*
>
> Senior Software Engineer
>
> Search Department
>
> ** **
>

Re: Store and retrieve an xml sequence without losing the markup

2012-12-20 Thread Upayavira

Right, you can store it, but you can't search on it that way, and you
certainly can't do complex searches that take the XML structure into
account (e.g. xpath queries).

Upayavira

On Thu, Dec 20, 2012, at 10:22 PM, Alexandre Rafalovitch wrote:
> What happens if you just supply it as CDATA into a string field? Store,
> no
> index, probably compressed and lazy.
> 
> Regards,
> Alex
> On 20 Dec 2012 09:30, "Modou DIA"  wrote:
> 
> > Hi everybody,
> >
> > i'm newbie with Solr technologies but in the past i worked with lucene
> > and another solution similar to Solr.
> > I'm working with solr 4.0. I use solrj for embedding an Solr server in
> > a cocoon 2.1 application.
> >
> > I want to know if it's possible to store (without indexing) a field
> > containing a xml sequence. I mean a field which can store xml data in
> > indexes without losing xpath informations.
> >
> > For exemple, this's a document to index:
> >
> > 
> >   
> > id_1
> > testing
> > 
> >   
> > testing
> >   
> > 
> >   
> > ...
> > 
> >
> > As you can see, the field named subdoc contains an xml sequence.
> >
> > So, when i query the indexes, i want to retrieve the data in subdoc
> > and i want to conserve the xml markup.
> >
> > Thank you for your help.
> > --
> > --
> > | Modou DIA
> > | modo...@gmail.com
> > --
> >

Re: Store and retrieve an xml sequence without losing the markup

2012-12-20 Thread Alexandre Rafalovitch

What happens if you just supply it as CDATA into a string field? Store, no
index, probably compressed and lazy.

Regards,
Alex
On 20 Dec 2012 09:30, "Modou DIA"  wrote:

> Hi everybody,
>
> i'm newbie with Solr technologies but in the past i worked with lucene
> and another solution similar to Solr.
> I'm working with solr 4.0. I use solrj for embedding an Solr server in
> a cocoon 2.1 application.
>
> I want to know if it's possible to store (without indexing) a field
> containing a xml sequence. I mean a field which can store xml data in
> indexes without losing xpath informations.
>
> For exemple, this's a document to index:
>
> 
>   
> id_1
> testing
> 
>   
> testing
>   
> 
>   
> ...
> 
>
> As you can see, the field named subdoc contains an xml sequence.
>
> So, when i query the indexes, i want to retrieve the data in subdoc
> and i want to conserve the xml markup.
>
> Thank you for your help.
> --
> --
> | Modou DIA
> | modo...@gmail.com
> --
>

Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Jack Krupansky


Yeah... not sure how I missed it, but my search sees it now.

Also, the name will default to "schema.xml" is you do leave it out of the 
schema.


-- Jack Krupansky

-Original Message- 
From: Mikhail Khludnev

Sent: Thursday, December 20, 2012 3:06 PM
To: solr-user
Subject: Re: Where does schema.xml's schema/@name displays?

Jack,
FWIW I've found occurrence in SystemInfoHandler.java


On Thu, Dec 20, 2012 at 6:32 PM, Jack Krupansky 
wrote:



I checked the 4.x source code and except for the fact that you will get a
warning if you leave it out, nothing uses that name. But... that's not to
say that a future release might not require it - the doc/comments don't
explicitly say that it is optional.

Note that the version attribute is optional (as per the source code, but
no mention in doc/comments) and defaults to 1.0, with no warning.

-- Jack Krupansky


-Original Message- From: Alexandre Rafalovitch
Sent: Thursday, December 20, 2012 12:08 AM
To: solr-user@lucene.apache.org
Subject: Where does schema.xml's schema/@name displays?

Hello,

In the schema.xml, we have a name attribute on the root note. The
documentation says it is for display purpose only. But for display where?

It seems that the admin console uses the name in the solr.xml file 
instead.
And deleting the name attribute does not seem to cause any problems 
either.


The reason I ask is because I am writing an explanation example which
involves schema.xml config file being copied and modified over and over
again. If @name is significant, I need to mention changing it. If not, I
will just delete it all together.

Regards,
  Alex.


Personal blog: http://blog.outerthoughts.com/
LinkedIn: 
http://www.linkedin.com/in/**alexandrerafalovitch

- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)





--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread alxsss

Depending on your architecture, why not index the same data into two machines? 
One will be your prod another your backup?

Thanks.
Alex.

 

 

 

-Original Message-
From: Upayavira 
To: solr-user 
Sent: Thu, Dec 20, 2012 11:51 am
Subject: Re: Pause and resume indexing on SolR 4 for backups


You're saying that there's no chance to catch it in the middle of
writing the segments file?

Having said that, the segments file is pretty small, so the chance would
be pretty slim.

Upayavira

On Thu, Dec 20, 2012, at 06:45 PM, Lance Norskog wrote:
> To be clear: 1) is fine. Lucene index updates are carefully sequenced so 
> that the index is never in a bogus state. All data files are written and 
> flushed to disk, then the segments.* files are written that match the 
> data files. You can capture the files with a set of hard links to create 
> a backup.
> 
> The CheckIndex program will verify the index backup.
> java -cp yourcopy/lucene-core-SOMETHING.jar 
> org.apache.lucene.index.CheckIndex collection/data/index
> 
> lucene-core-SOMETHING.jar is usually in the solr-webapp directory where 
> Solr is unpacked.
> 
> On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote:
> > Hi all.
> >
> > Can anyone advise me of a way to pause and resume SolR 4 so I can 
> > perform a backup? I need to be able to revert to a usable (though not 
> > necessarily complete) index after a crash or other "disaster" more 
> > quickly than a re-index operation would yield.
> >
> > I can't yet afford the "extravagance" of a separate SolR replica just 
> > for backups, and I'm not sure if I'll ever have the luxury. I'm 
> > currently running with just one node, be we are not yet live.
> >
> > I can think of the following ways to do this, each with various 
> > downsides:
> >
> > 1) Just backup the existing index files whilst indexing continues
> > + Easy
> > + Fast
> > - Incomplete
> > - Potential for corruption? (e.g. partial files)
> >
> > 2) Stop/Start Tomcat
> > + Easy
> > - Very slow and I/O, CPU intensive
> > - Client gets errors when trying to connect
> >
> > 3) Block/unblock SolR port with IpTables
> > + Fast
> > - Client gets errors when trying to connect
> > - Have to wait for existing transactions to complete (not sure 
> > how, maybe watch socket FD's in /proc)
> >
> > 4) Pause/Restart SolR service
> > + Fast ? (hopefully)
> > - Client gets errors when trying to connect
> >
> > In any event, the web app will have to gracefully handle 
> > unavailability of SolR, probably by displaying a "down for 
> > maintenance" message, but this should preferably be only a very short 
> > amount of time.
> >
> > Can anyone comment on my proposed solutions above, or provide any 
> > additional ones?
> >
> > Thanks for any input you can provide!
> >
> > -Andy
> >
>

Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Mikhail Khludnev

Jack,
FWIW I've found occurrence in SystemInfoHandler.java


On Thu, Dec 20, 2012 at 6:32 PM, Jack Krupansky wrote:

> I checked the 4.x source code and except for the fact that you will get a
> warning if you leave it out, nothing uses that name. But... that's not to
> say that a future release might not require it - the doc/comments don't
> explicitly say that it is optional.
>
> Note that the version attribute is optional (as per the source code, but
> no mention in doc/comments) and defaults to 1.0, with no warning.
>
> -- Jack Krupansky
>
>
> -Original Message- From: Alexandre Rafalovitch
> Sent: Thursday, December 20, 2012 12:08 AM
> To: solr-user@lucene.apache.org
> Subject: Where does schema.xml's schema/@name displays?
>
> Hello,
>
> In the schema.xml, we have a name attribute on the root note. The
> documentation says it is for display purpose only. But for display where?
>
> It seems that the admin console uses the name in the solr.xml file instead.
> And deleting the name attribute does not seem to cause any problems either.
>
> The reason I ask is because I am writing an explanation example which
> involves schema.xml config file being copied and modified over and over
> again. If @name is significant, I need to mention changing it. If not, I
> will just delete it all together.
>
> Regards,
>   Alex.
>
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: 
> http://www.linkedin.com/in/**alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira

You're saying that there's no chance to catch it in the middle of
writing the segments file?

Having said that, the segments file is pretty small, so the chance would
be pretty slim.

Upayavira

On Thu, Dec 20, 2012, at 06:45 PM, Lance Norskog wrote:
> To be clear: 1) is fine. Lucene index updates are carefully sequenced so 
> that the index is never in a bogus state. All data files are written and 
> flushed to disk, then the segments.* files are written that match the 
> data files. You can capture the files with a set of hard links to create 
> a backup.
> 
> The CheckIndex program will verify the index backup.
> java -cp yourcopy/lucene-core-SOMETHING.jar 
> org.apache.lucene.index.CheckIndex collection/data/index
> 
> lucene-core-SOMETHING.jar is usually in the solr-webapp directory where 
> Solr is unpacked.
> 
> On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote:
> > Hi all.
> >
> > Can anyone advise me of a way to pause and resume SolR 4 so I can 
> > perform a backup? I need to be able to revert to a usable (though not 
> > necessarily complete) index after a crash or other "disaster" more 
> > quickly than a re-index operation would yield.
> >
> > I can't yet afford the "extravagance" of a separate SolR replica just 
> > for backups, and I'm not sure if I'll ever have the luxury. I'm 
> > currently running with just one node, be we are not yet live.
> >
> > I can think of the following ways to do this, each with various 
> > downsides:
> >
> > 1) Just backup the existing index files whilst indexing continues
> > + Easy
> > + Fast
> > - Incomplete
> > - Potential for corruption? (e.g. partial files)
> >
> > 2) Stop/Start Tomcat
> > + Easy
> > - Very slow and I/O, CPU intensive
> > - Client gets errors when trying to connect
> >
> > 3) Block/unblock SolR port with IpTables
> > + Fast
> > - Client gets errors when trying to connect
> > - Have to wait for existing transactions to complete (not sure 
> > how, maybe watch socket FD's in /proc)
> >
> > 4) Pause/Restart SolR service
> > + Fast ? (hopefully)
> > - Client gets errors when trying to connect
> >
> > In any event, the web app will have to gracefully handle 
> > unavailability of SolR, probably by displaying a "down for 
> > maintenance" message, but this should preferably be only a very short 
> > amount of time.
> >
> > Can anyone comment on my proposed solutions above, or provide any 
> > additional ones?
> >
> > Thanks for any input you can provide!
> >
> > -Andy
> >
>

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Lance Norskog

To be clear: 1) is fine. Lucene index updates are carefully sequenced so 
that the index is never in a bogus state. All data files are written and 
flushed to disk, then the segments.* files are written that match the 
data files. You can capture the files with a set of hard links to create 
a backup.


The CheckIndex program will verify the index backup.
java -cp yourcopy/lucene-core-SOMETHING.jar 
org.apache.lucene.index.CheckIndex collection/data/index


lucene-core-SOMETHING.jar is usually in the solr-webapp directory where 
Solr is unpacked.


On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote:

Hi all.

Can anyone advise me of a way to pause and resume SolR 4 so I can 
perform a backup? I need to be able to revert to a usable (though not 
necessarily complete) index after a crash or other "disaster" more 
quickly than a re-index operation would yield.


I can't yet afford the "extravagance" of a separate SolR replica just 
for backups, and I'm not sure if I'll ever have the luxury. I'm 
currently running with just one node, be we are not yet live.


I can think of the following ways to do this, each with various 
downsides:


1) Just backup the existing index files whilst indexing continues
+ Easy
+ Fast
- Incomplete
- Potential for corruption? (e.g. partial files)

2) Stop/Start Tomcat
+ Easy
- Very slow and I/O, CPU intensive
- Client gets errors when trying to connect

3) Block/unblock SolR port with IpTables
+ Fast
- Client gets errors when trying to connect
- Have to wait for existing transactions to complete (not sure 
how, maybe watch socket FD's in /proc)


4) Pause/Restart SolR service
+ Fast ? (hopefully)
- Client gets errors when trying to connect

In any event, the web app will have to gracefully handle 
unavailability of SolR, probably by displaying a "down for 
maintenance" message, but this should preferably be only a very short 
amount of time.


Can anyone comment on my proposed solutions above, or provide any 
additional ones?


Thanks for any input you can provide!

-Andy

Solr/Lucene Engineer - Contract Opportunity - Raleigh, NC

2012-12-20 Thread Polak, Tom

Hi Lance,

I am an IT Recruiter in Raleigh, NC. Would you or would anyone you know be 
interested in a long term contract opportunity for a Solr/Lucene Engineer with 
Cisco here in RTP, NC?

Thanks for your time Lance and have a safe and happy Holiday!


[Description: Description: Description: Description: 
C:\Users\dhil2\AppData\Roaming\Microsoft\Signatures\ExperisIT.jpg]

Tom Polak
IT Recruiter


Experis IT Staffing
1122 Oberlin Road
Raleigh, NC 27605
T:

919 755 5838

F:

919 755 5828

C:

919 457 8530

tom.po...@experis.com
www.experis.com



[Description: 
cid:image008.png@01CD9A38.C965CAF0]

[Description: 
cid:image007.gif@01CD9A38.C965CAF0]

Referral Program:  Easy ...Refer an IT Professional in your network to me 
today!

Re: SolrCloud: only partial results returned

2012-12-20 Thread Lili

Mark,  yes,  they have unique ids.   Most the time, after the 2nd json http
post, query will return complete results. 

I believe the data was indexed already with 1st post since if I shutdown the
solr after 1st post and restart again,  query will return complete result
set.

Thanks,

Lili



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-only-partial-results-returned-tp4028200p4028367.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira

Are you sure a commit didn't happen between? Also, a background merge
might have happened.

As to using a backup, you are right, just stop solr, put the snapshot
into index/data, and restart.

Upayavira

On Thu, Dec 20, 2012, at 05:16 PM, Andy D'Arcy Jewell wrote:
> On 20/12/12 13:38, Upayavira wrote:
> > The backup directory should just be a clone of the index files. I'm
> > curious to know whether it is a cp -r or a cp -lr that the replication
> > handler produces.
> >
> > You would prevent commits by telling your app not to commit. That is,
> > Solr only commits when it is *told* to.
> >
> > Unless you use autocommit, in which case I guess you could monitor your
> > logs for the last commit, and do your backup a 10 seconds after that.
> >
> >
> Hmm. Strange - the files created by the backup API don't seem to 
> correlate exactly with the files stored under the solr data directory:
> 
> andydj@me-solr01:~$ find /tmp/snapshot.20121220155853703/
> /tmp/snapshot.20121220155853703/
> /tmp/snapshot.20121220155853703/_2vq.fdx
> /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tim
> /tmp/snapshot.20121220155853703/segments_2vs
> /tmp/snapshot.20121220155853703/_2vq_nrm.cfs
> /tmp/snapshot.20121220155853703/_2vq.fnm
> /tmp/snapshot.20121220155853703/_2vq_nrm.cfe
> /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.frq
> /tmp/snapshot.20121220155853703/_2vq.fdt
> /tmp/snapshot.20121220155853703/_2vq.si
> /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tip
> andydj@me-solr01:~$ find /var/lib/solr/data/index/
> /var/lib/solr/data/index/
> /var/lib/solr/data/index/_2w6_Lucene40_0.frq
> /var/lib/solr/data/index/_2w6.si
> /var/lib/solr/data/index/segments_2w8
> /var/lib/solr/data/index/write.lock
> /var/lib/solr/data/index/_2w6_nrm.cfs
> /var/lib/solr/data/index/_2w6.fdx
> /var/lib/solr/data/index/_2w6_Lucene40_0.tip
> /var/lib/solr/data/index/_2w6_nrm.cfe
> /var/lib/solr/data/index/segments.gen
> /var/lib/solr/data/index/_2w6.fnm
> /var/lib/solr/data/index/_2w6.fdt
> /var/lib/solr/data/index/_2w6_Lucene40_0.tim
> 
> Am I correct in thinking that to restore from this backup, I would need 
> to do the following?
> 
> 1. Stop Tomcat (or maybe just solr)
> 2. Remove all files under /var/lib/solr/data/index/
> 3. Move/copy files from /tmp/snapshot.20121220155853703/ to 
> /var/lib/solr/data/index/
> 4. Restart Tomcat (or just solr)
> 
> 
> Thanks everyone who's pitched in on this! Once I've got this working, 
> I'll document it.
> -Andy
> 
> -- 
> Andy D'Arcy Jewell
> 
> SysMicro Limited
> Linux Support
> E:  andy.jew...@sysmicro.co.uk
> W:  www.sysmicro.co.uk
>

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell


On 20/12/12 13:38, Upayavira wrote:

The backup directory should just be a clone of the index files. I'm
curious to know whether it is a cp -r or a cp -lr that the replication
handler produces.

You would prevent commits by telling your app not to commit. That is,
Solr only commits when it is *told* to.

Unless you use autocommit, in which case I guess you could monitor your
logs for the last commit, and do your backup a 10 seconds after that.


Hmm. Strange - the files created by the backup API don't seem to 
correlate exactly with the files stored under the solr data directory:


andydj@me-solr01:~$ find /tmp/snapshot.20121220155853703/
/tmp/snapshot.20121220155853703/
/tmp/snapshot.20121220155853703/_2vq.fdx
/tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tim
/tmp/snapshot.20121220155853703/segments_2vs
/tmp/snapshot.20121220155853703/_2vq_nrm.cfs
/tmp/snapshot.20121220155853703/_2vq.fnm
/tmp/snapshot.20121220155853703/_2vq_nrm.cfe
/tmp/snapshot.20121220155853703/_2vq_Lucene40_0.frq
/tmp/snapshot.20121220155853703/_2vq.fdt
/tmp/snapshot.20121220155853703/_2vq.si
/tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tip
andydj@me-solr01:~$ find /var/lib/solr/data/index/
/var/lib/solr/data/index/
/var/lib/solr/data/index/_2w6_Lucene40_0.frq
/var/lib/solr/data/index/_2w6.si
/var/lib/solr/data/index/segments_2w8
/var/lib/solr/data/index/write.lock
/var/lib/solr/data/index/_2w6_nrm.cfs
/var/lib/solr/data/index/_2w6.fdx
/var/lib/solr/data/index/_2w6_Lucene40_0.tip
/var/lib/solr/data/index/_2w6_nrm.cfe
/var/lib/solr/data/index/segments.gen
/var/lib/solr/data/index/_2w6.fnm
/var/lib/solr/data/index/_2w6.fdt
/var/lib/solr/data/index/_2w6_Lucene40_0.tim

Am I correct in thinking that to restore from this backup, I would need 
to do the following?


1. Stop Tomcat (or maybe just solr)
2. Remove all files under /var/lib/solr/data/index/
3. Move/copy files from /tmp/snapshot.20121220155853703/ to 
/var/lib/solr/data/index/

4. Restart Tomcat (or just solr)


Thanks everyone who's pitched in on this! Once I've got this working, 
I'll document it.

-Andy

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk

RE: Ensuring SpellChecker returns corrections which satisfy fq params for default OR query

2012-12-20 Thread Dyer, James

The spellchecker doesn't support checking the indivdual words against the index 
with "fq" applied.  This is only done for collations (and only if 
"maxCollationTries" is greater than 0).  Checking every suggested word 
individually against the index after applying filter queries is probably going 
to be very expensive no matter how you implement it.  However, someone with 
more lucene-core knowledge than I have might be able to give you better advice.

If your root problem, though, is getting good "did-you-mean"-style suggestions 
with dismax queries and mm=0, and if you want to consider the case where some 
words in the query are misspelled and others are entirely irrelevant (and can't 
be corrected), then setting "maxResultsForSuggest" to a high value might give 
you the end result you want.  Unlike if you use 
"spellcheck.collateParam.mm=100%", it won't insist that the irrelevant terms 
(or a "corrected" irrelevant term) match anything.  On the other hand, it won't 
assume the query is "Correctly
Spelled" just because you got some hits from it (because mm=0 will just cause 
the misspelled terms to be thrown out).

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nalini Kartha [mailto:nalinikar...@gmail.com] 
Sent: Thursday, December 20, 2012 8:53 AM
To: solr-user@lucene.apache.org
Subject: Re: Ensuring SpellChecker returns corrections which satisfy fq params 
for default OR query

Hi James,

I don't get how the spellcheck.maxResultsForSuggest param helps with making
sure that the suggestions returned satisfy the fq params?

That's the main problem we're trying to solve, how often suggestions are
being returned is not really an issue for us at the moment.

Thanks,
Nalini


On Wed, Dec 19, 2012 at 4:35 PM, Dyer, James
wrote:

> Instead of using spellcheck.collateParam.mm, try just setting
> spellcheck.maxResultsForSuggest to a very high value (you can use up to
> Integer.MAX_VALUE here).  So long as the user gets fewer results that
> whatever this is set for, you will get suggestions (and collations if
> desired).  I was just playing with this and if I am understanding you
> correctly think this combination of parameters will give you what you want:
>
> spellcheck=true
>
> spellcheck.dictionary=whatever
>
> spellcheck.maxResultsForSuggest=1000 (or whatever the cut off is
> before you don't want suggestions)
>
> spellcheck.count=20 (+/- depending on performance vs # suggestions
> required)
>
> spellcheck.maxCollationTries=10 (+/- depending on performance vs #
> suggestions required)
>
> spellcheck.maxCollations=10 (+/- depending on performance vs # suggestions
> required)
>
> spellcheck.collate=true
>
> spellcheck.collateExtendedResults=true
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Nalini Kartha [mailto:nalinikar...@gmail.com]
> Sent: Wednesday, December 19, 2012 2:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Ensuring SpellChecker returns corrections which satisfy fq
> params for default OR query
>
> Hi James,
>
> Yup the example you gave about sums it up. Reason we use an OR query is
> that we want the flexibility of every term not having to match but when it
> comes to corrections we want to be sure that the ones we pick will actually
> return results (we message the user with the corrected query so it would be
> bad/confusing if there were no matches for the corrections).
>
> *- by default the spellchecker doesn't see this as a problem because it has
> hits (mm=0 and "wrapping" matches something).  So you get neither
> individual words back nor collations from the spellchecker.*
> *
> *
> I think we would still get back 'papr -> paper' as a correction and
> 'christmas wrapping paper' as a collation in this case - I've seen that
> corrections are returned even for OR queries. Problem is these would be
> returned even if 'paper' doesn't exist in any docs that have item:in_stock.
>
> *- with "spellcheck.collateParam.mm  >=100"
> it tries to fix both "papr" and "christmas" but can't fix "christmas"
> because spelling isn't the problem here (it is an irrelevant term not in
> the index).  So while you get words suggested there are no collations.  The
> individual words would be helpful, but you're not sure because they might
> all apply to items that do not match "fq=item:in_stock".*
>
> Yup, exactly.
>
> Do you think the workaround I suggested would work (and not have terrible
> perf)? Or any other ideas?
>
> Thanks,
> Nalini
>
>
> On Wed, Dec 19, 2012 at 1:09 PM, Dyer, James
> wrote:
>
> > Let me try and get a better idea of what you're after.  Is it that your
> > users might query a combination of irrelevant terms and misspelled terms,
> > so you want the ability to ignore the irrelevant terms but still get
> > suggestions for the misspelled terms?
> >
> > For instance if someone wanted "q=christmas wrapping
> > papr&mm=0&fq=item

Re: Store and retrieve an xml sequence without losing the markup

2012-12-20 Thread Upayavira

Solr does not support nested structures. You need to flatten your data
before indexing. You can store data in the way you did to be returned to
your users, but you will not be able to search within the XML as XML.

If you can explain the problem you are trying to solve, maybe folks here
can help you find an alternative way of getting there.

Upayavira

On Thu, Dec 20, 2012, at 02:29 PM, Modou DIA wrote:
> Hi everybody,
> 
> i'm newbie with Solr technologies but in the past i worked with lucene
> and another solution similar to Solr.
> I'm working with solr 4.0. I use solrj for embedding an Solr server in
> a cocoon 2.1 application.
> 
> I want to know if it's possible to store (without indexing) a field
> containing a xml sequence. I mean a field which can store xml data in
> indexes without losing xpath informations.
> 
> For exemple, this's a document to index:
> 
> 
>   
> id_1
> testing
> 
>   
> testing
>   
> 
>   
> ...
> 
> 
> As you can see, the field named subdoc contains an xml sequence.
> 
> So, when i query the indexes, i want to retrieve the data in subdoc
> and i want to conserve the xml markup.
> 
> Thank you for your help.
> -- 
> --
> | Modou DIA
> | modo...@gmail.com
> --

Re: SolrCloud: only partial results returned

2012-12-20 Thread Mark Miller

Does all the data have unique ids?

- Mark

On Dec 19, 2012, at 8:30 PM, Lili  wrote:

> We set up SolrCloud with 2 shards and separate multiple zookeepers.   The
> data added using http post with json in tutorial sample are not completely
> returned in query.However, if you send the same http post request again
> or shutdown solr instance and restart,  the complete results will be
> returned.   
> 
> We have tried adding "distrib=true" in query or even adding "shards=.".  
> Still,  only partial results are returned. 
> 
> This happened with embeded zookeepers too.
> 
> However,  this doesn't seem to happen if you add data with xml in tutorial
> samples.
> 
> Any thoughts on what might be wrong or is it a known issue?
> 
> 
> Thanks,
> 
> Lili
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-only-partial-results-returned-tp4028200.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Ensuring SpellChecker returns corrections which satisfy fq params for default OR query

2012-12-20 Thread Nalini Kartha

Hi James,

I don't get how the spellcheck.maxResultsForSuggest param helps with making
sure that the suggestions returned satisfy the fq params?

That's the main problem we're trying to solve, how often suggestions are
being returned is not really an issue for us at the moment.

Thanks,
Nalini


On Wed, Dec 19, 2012 at 4:35 PM, Dyer, James
wrote:

> Instead of using spellcheck.collateParam.mm, try just setting
> spellcheck.maxResultsForSuggest to a very high value (you can use up to
> Integer.MAX_VALUE here).  So long as the user gets fewer results that
> whatever this is set for, you will get suggestions (and collations if
> desired).  I was just playing with this and if I am understanding you
> correctly think this combination of parameters will give you what you want:
>
> spellcheck=true
>
> spellcheck.dictionary=whatever
>
> spellcheck.maxResultsForSuggest=1000 (or whatever the cut off is
> before you don't want suggestions)
>
> spellcheck.count=20 (+/- depending on performance vs # suggestions
> required)
>
> spellcheck.maxCollationTries=10 (+/- depending on performance vs #
> suggestions required)
>
> spellcheck.maxCollations=10 (+/- depending on performance vs # suggestions
> required)
>
> spellcheck.collate=true
>
> spellcheck.collateExtendedResults=true
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Nalini Kartha [mailto:nalinikar...@gmail.com]
> Sent: Wednesday, December 19, 2012 2:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Ensuring SpellChecker returns corrections which satisfy fq
> params for default OR query
>
> Hi James,
>
> Yup the example you gave about sums it up. Reason we use an OR query is
> that we want the flexibility of every term not having to match but when it
> comes to corrections we want to be sure that the ones we pick will actually
> return results (we message the user with the corrected query so it would be
> bad/confusing if there were no matches for the corrections).
>
> *- by default the spellchecker doesn't see this as a problem because it has
> hits (mm=0 and "wrapping" matches something).  So you get neither
> individual words back nor collations from the spellchecker.*
> *
> *
> I think we would still get back 'papr -> paper' as a correction and
> 'christmas wrapping paper' as a collation in this case - I've seen that
> corrections are returned even for OR queries. Problem is these would be
> returned even if 'paper' doesn't exist in any docs that have item:in_stock.
>
> *- with "spellcheck.collateParam.mm  >=100"
> it tries to fix both "papr" and "christmas" but can't fix "christmas"
> because spelling isn't the problem here (it is an irrelevant term not in
> the index).  So while you get words suggested there are no collations.  The
> individual words would be helpful, but you're not sure because they might
> all apply to items that do not match "fq=item:in_stock".*
>
> Yup, exactly.
>
> Do you think the workaround I suggested would work (and not have terrible
> perf)? Or any other ideas?
>
> Thanks,
> Nalini
>
>
> On Wed, Dec 19, 2012 at 1:09 PM, Dyer, James
> wrote:
>
> > Let me try and get a better idea of what you're after.  Is it that your
> > users might query a combination of irrelevant terms and misspelled terms,
> > so you want the ability to ignore the irrelevant terms but still get
> > suggestions for the misspelled terms?
> >
> > For instance if someone wanted "q=christmas wrapping
> > papr&mm=0&fq=item:in_stock", but "christmas" was not in the index and you
> > wanted to return results for just "wrapping paper", the problem is...
> >
> > - by default the spellchecker doesn't see this as a problem because it
> has
> > hits (mm=0 and "wrapping" matches something).  So you get neither
> > individual words back nor collations from the spellchecker.
> >
> > - with "spellcheck.collateParam.mm=100" it tries to fix both "papr" and
> > "christmas" but can't fix "christmas" because spelling isn't the problem
> > here (it is an irrelevant term not in the index).  So while you get words
> > suggested there are no collations.  The individual words would be
> helpful,
> > but you're not sure because they might all apply to items that do not
> match
> > "fq=item:in_stock".
> >
> > Is this the problem?
> >
> > James Dyer
> > E-Commerce Systems
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -Original Message-
> > From: Nalini Kartha [mailto:nalinikar...@gmail.com]
> > Sent: Wednesday, December 19, 2012 11:20 AM
> > To: solr-user@lucene.apache.org
> > Subject: Ensuring SpellChecker returns corrections which satisfy fq
> params
> > for default OR query
> >
> > Hi,
> >
> > With the DirectSolrSpellChecker, we want to be able to make sure that the
> > corrections that are being returned satisfy the fq params of the original
> > query.
> >
> > The collate functionality helps with this but seems to only work with
> > default AND queries - o

Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Jack Krupansky

I checked the 4.x source code and except for the fact that you will get a 
warning if you leave it out, nothing uses that name. But... that's not to 
say that a future release might not require it - the doc/comments don't 
explicitly say that it is optional.


Note that the version attribute is optional (as per the source code, but no 
mention in doc/comments) and defaults to 1.0, with no warning.


-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Thursday, December 20, 2012 12:08 AM
To: solr-user@lucene.apache.org
Subject: Where does schema.xml's schema/@name displays?

Hello,

In the schema.xml, we have a name attribute on the root note. The
documentation says it is for display purpose only. But for display where?

It seems that the admin console uses the name in the solr.xml file instead.
And deleting the name attribute does not seem to cause any problems either.

The reason I ask is because I am writing an explanation example which
involves schema.xml config file being copied and modified over and over
again. If @name is significant, I need to mention changing it. If not, I
will just delete it all together.

Regards,
  Alex.


Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

RE: Putting more weight on particular column.

2012-12-20 Thread Prachi Phatak

I am sorry, I am still not clear. Do you mean I should use enterpriseid as ID.

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Wednesday, December 19, 2012 7:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Putting more weight on particular column.

Hi,

If I understand correctly, you want to search against a specific field - 
enterprise id. To do that just use something like enterpriseid:(keywords).

Yes, you can sort using sort URL parameter. This stuff id on the Wiki and you 
can search it, too. :)

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm On Dec 19, 2012 8:07 PM, 
"Prachi Phatak"  wrote:

> We configured Enhanced search in our environment and while testing 
> noticed below behavior:
>
> Just searching for "P" returns Enterprise ID which has "P" (Pankaj) as 
> well as Resource Entity which has "P" (Sowmya), and the result lists 
> Sowmya above Pankaj.
>
> *   We are looking to limit the search just to enterprise id, and just
> display the attributes of the returned options.
> -   So based on below example, we are looking for the returned results
> to return only Enterprise IDs which have "P" in it, but return other 
> attributes of the resource for viewing only.
> *   Do we have any control over the order in which results are
> displayed and try to influence what gets shown in the initial results 
> that way?
> -   As the search was intended for Enterprise Id, User would expect
> the matching results from Enterprise ID to be shown on Top, for e.g. 
> in below scenario Pankaj should appear above Sowmya.
> Can we implement this?
>
>
>

Store and retrieve an xml sequence without losing the markup

2012-12-20 Thread Modou DIA

Hi everybody,

i'm newbie with Solr technologies but in the past i worked with lucene
and another solution similar to Solr.
I'm working with solr 4.0. I use solrj for embedding an Solr server in
a cocoon 2.1 application.

I want to know if it's possible to store (without indexing) a field
containing a xml sequence. I mean a field which can store xml data in
indexes without losing xpath informations.

For exemple, this's a document to index:


  
id_1
testing

  
testing
  

  
...


As you can see, the field named subdoc contains an xml sequence.

So, when i query the indexes, i want to retrieve the data in subdoc
and i want to conserve the xml markup.

Thank you for your help.
-- 
--
| Modou DIA
| modo...@gmail.com
--

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira

That's neat, but wouldn't that run on every commit? How would you use it
to, say, back up once a day?

Upayavira

On Thu, Dec 20, 2012, at 01:57 PM, Markus Jelsma wrote:
> You can use the postCommit event in updateHandler to execute a task. 
>  
> -Original message-
> > From:Upayavira 
> > Sent: Thu 20-Dec-2012 14:45
> > To: solr-user@lucene.apache.org
> > Subject: Re: Pause and resume indexing on SolR 4 for backups
> > 
> > The backup directory should just be a clone of the index files. I'm
> > curious to know whether it is a cp -r or a cp -lr that the replication
> > handler produces.
> > 
> > You would prevent commits by telling your app not to commit. That is,
> > Solr only commits when it is *told* to.
> > 
> > Unless you use autocommit, in which case I guess you could monitor your
> > logs for the last commit, and do your backup a 10 seconds after that.
> > 
> > Upayavira
> > 
> > On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote:
> > > On 20/12/12 11:58, Upayavira wrote:
> > > > I've never used it, but the replication handler has an option:
> > > >
> > > >http://master_host:port/solr/replication?command=backup
> > > >
> > > > Which will take you a backup.
> > > I've looked at that this morning as suggested by Markus Jelsma. Looks 
> > > good, but I'll have to work out how to use the resultant backup 
> > > directory. I've been dealing with another unrelated issue in the 
> > > mean-time and I haven't had a chance to look for any docu so far.
> > > > Also something to note, if you don't want to use the above, and you are
> > > > running on Unix, you can create fast 'hard link' clones of lucene
> > > > indexes. Doing:
> > > >
> > > > cp -lr data data.bak
> > > >
> > > > will copy your index instantly. If you can avoid doing this when a
> > > > commit is happening, then you'll have a good index copy, that will take
> > > > no space on your disk and be made instantly. This is because it just
> > > > copies the directory structure, not the files themselves, and given
> > > > files in a lucene index never change (they are only ever deleted or
> > > > replaced), this works as a good copy technique for backing up.
> > > That's the approach that Shawn Heisey proposed, and what I've been 
> > > working towards,  but it still leaves open the question of how to 
> > > *pause* SolR or prevent commits during the backup (otherwise we have a 
> > > potential race condition).
> > > 
> > > -Andy
> > > 
> > > 
> > > -- 
> > > Andy D'Arcy Jewell
> > > 
> > > SysMicro Limited
> > > Linux Support
> > > E:  andy.jew...@sysmicro.co.uk
> > > W:  www.sysmicro.co.uk
> > > 
> >

RE: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Markus Jelsma

You can use the postCommit event in updateHandler to execute a task. 
 
-Original message-
> From:Upayavira 
> Sent: Thu 20-Dec-2012 14:45
> To: solr-user@lucene.apache.org
> Subject: Re: Pause and resume indexing on SolR 4 for backups
> 
> The backup directory should just be a clone of the index files. I'm
> curious to know whether it is a cp -r or a cp -lr that the replication
> handler produces.
> 
> You would prevent commits by telling your app not to commit. That is,
> Solr only commits when it is *told* to.
> 
> Unless you use autocommit, in which case I guess you could monitor your
> logs for the last commit, and do your backup a 10 seconds after that.
> 
> Upayavira
> 
> On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote:
> > On 20/12/12 11:58, Upayavira wrote:
> > > I've never used it, but the replication handler has an option:
> > >
> > >http://master_host:port/solr/replication?command=backup
> > >
> > > Which will take you a backup.
> > I've looked at that this morning as suggested by Markus Jelsma. Looks 
> > good, but I'll have to work out how to use the resultant backup 
> > directory. I've been dealing with another unrelated issue in the 
> > mean-time and I haven't had a chance to look for any docu so far.
> > > Also something to note, if you don't want to use the above, and you are
> > > running on Unix, you can create fast 'hard link' clones of lucene
> > > indexes. Doing:
> > >
> > > cp -lr data data.bak
> > >
> > > will copy your index instantly. If you can avoid doing this when a
> > > commit is happening, then you'll have a good index copy, that will take
> > > no space on your disk and be made instantly. This is because it just
> > > copies the directory structure, not the files themselves, and given
> > > files in a lucene index never change (they are only ever deleted or
> > > replaced), this works as a good copy technique for backing up.
> > That's the approach that Shawn Heisey proposed, and what I've been 
> > working towards,  but it still leaves open the question of how to 
> > *pause* SolR or prevent commits during the backup (otherwise we have a 
> > potential race condition).
> > 
> > -Andy
> > 
> > 
> > -- 
> > Andy D'Arcy Jewell
> > 
> > SysMicro Limited
> > Linux Support
> > E:  andy.jew...@sysmicro.co.uk
> > W:  www.sysmicro.co.uk
> > 
>

RE: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Michael Ryan

In our system (using 3.6), it is displayed on /solr/admin/. I'd guess that the 
value in solr.xml overrides the one in schema.xml, but not sure.

-Michael

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, December 20, 2012 12:08 AM
To: solr-user@lucene.apache.org
Subject: Where does schema.xml's schema/@name displays?

Hello,

In the schema.xml, we have a name attribute on the root note. The documentation 
says it is for display purpose only. But for display where?

It seems that the admin console uses the name in the solr.xml file instead.
And deleting the name attribute does not seem to cause any problems either.

The reason I ask is because I am writing an explanation example which involves 
schema.xml config file being copied and modified over and over again. If @name 
is significant, I need to mention changing it. If not, I will just delete it 
all together.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once. 
Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Upayavira

Personally I have never given it any attention, so I suspect it doesn't
matter much.

Upayavira

On Thu, Dec 20, 2012, at 05:08 AM, Alexandre Rafalovitch wrote:
> Hello,
> 
> In the schema.xml, we have a name attribute on the root note. The
> documentation says it is for display purpose only. But for display where?
> 
> It seems that the admin console uses the name in the solr.xml file
> instead.
> And deleting the name attribute does not seem to cause any problems
> either.
> 
> The reason I ask is because I am writing an explanation example which
> involves schema.xml config file being copied and modified over and over
> again. If @name is significant, I need to mention changing it. If not, I
> will just delete it all together.
> 
> Regards,
>Alex.
> 
> 
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira

The backup directory should just be a clone of the index files. I'm
curious to know whether it is a cp -r or a cp -lr that the replication
handler produces.

You would prevent commits by telling your app not to commit. That is,
Solr only commits when it is *told* to.

Unless you use autocommit, in which case I guess you could monitor your
logs for the last commit, and do your backup a 10 seconds after that.

Upayavira

On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote:
> On 20/12/12 11:58, Upayavira wrote:
> > I've never used it, but the replication handler has an option:
> >
> >http://master_host:port/solr/replication?command=backup
> >
> > Which will take you a backup.
> I've looked at that this morning as suggested by Markus Jelsma. Looks 
> good, but I'll have to work out how to use the resultant backup 
> directory. I've been dealing with another unrelated issue in the 
> mean-time and I haven't had a chance to look for any docu so far.
> > Also something to note, if you don't want to use the above, and you are
> > running on Unix, you can create fast 'hard link' clones of lucene
> > indexes. Doing:
> >
> > cp -lr data data.bak
> >
> > will copy your index instantly. If you can avoid doing this when a
> > commit is happening, then you'll have a good index copy, that will take
> > no space on your disk and be made instantly. This is because it just
> > copies the directory structure, not the files themselves, and given
> > files in a lucene index never change (they are only ever deleted or
> > replaced), this works as a good copy technique for backing up.
> That's the approach that Shawn Heisey proposed, and what I've been 
> working towards,  but it still leaves open the question of how to 
> *pause* SolR or prevent commits during the backup (otherwise we have a 
> potential race condition).
> 
> -Andy
> 
> 
> -- 
> Andy D'Arcy Jewell
> 
> SysMicro Limited
> Linux Support
> E:  andy.jew...@sysmicro.co.uk
> W:  www.sysmicro.co.uk
>

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell


On 20/12/12 11:58, Upayavira wrote:

I've never used it, but the replication handler has an option:

   http://master_host:port/solr/replication?command=backup

Which will take you a backup.
I've looked at that this morning as suggested by Markus Jelsma. Looks 
good, but I'll have to work out how to use the resultant backup 
directory. I've been dealing with another unrelated issue in the 
mean-time and I haven't had a chance to look for any docu so far.

Also something to note, if you don't want to use the above, and you are
running on Unix, you can create fast 'hard link' clones of lucene
indexes. Doing:

cp -lr data data.bak

will copy your index instantly. If you can avoid doing this when a
commit is happening, then you'll have a good index copy, that will take
no space on your disk and be made instantly. This is because it just
copies the directory structure, not the files themselves, and given
files in a lucene index never change (they are only ever deleted or
replaced), this works as a good copy technique for backing up.
That's the approach that Shawn Heisey proposed, and what I've been 
working towards,  but it still leaves open the question of how to 
*pause* SolR or prevent commits during the backup (otherwise we have a 
potential race condition).


-Andy


--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk

Re: Dynamic modification of field value

2012-12-20 Thread Upayavira

Which strikes me as the right way to go.

Upayavira

On Thu, Dec 20, 2012, at 12:30 PM, AlexeyK wrote:
> Implemented it with http://wiki.apache.org/solr/DocTransformers.
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Dynamic-modification-of-field-value-tp4028234p4028301.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Finding the last committed record in SOLR 4

2012-12-20 Thread Upayavira

I cannot see how SolrJ and the admin UI would return different results.
Could you run exactly the same query on both and show what you get here?

Upayavira

On Thu, Dec 20, 2012, at 06:17 AM, Joe wrote:
> I'm using SOLR 4 for an application, where I need to search the index
> soon
> after inserting records. 
> 
> I'm using the solrj code below to get the last ID in the index. However,
> I
> noticed that the last id I see when I execute a query through the solr
> web
> admin is often lagging behind this. And that my searches are not
> including
> all documents up until the last ID I get from the code snippet below. I'm
> guessing this is because of delays in hard commits. I don't need to
> switch
> to soft commits yet. I just want to make sure that I get the ID of the
> last
> searchable document. Is this possible to do?
> 
> 
>SolrQuery query = new SolrQuery();
>query.set("qt","/select");
>query.setQuery( "*:*" );
>query.setFields("id");
>query.set("rows","1");
>query.set("sort","id desc");
> 
>QueryResponse rsp = m_Server.query( query );
>SolrDocumentList docs = rsp.getResults();
>SolrDocument doc = docs.get(0);
>long id = (Long) doc.getFieldValue("id");
>
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Finding-the-last-committed-record-in-SOLR-4-tp4028235.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira

I've never used it, but the replication handler has an option:

  http://master_host:port/solr/replication?command=backup 

Which will take you a backup.

Also something to note, if you don't want to use the above, and you are
running on Unix, you can create fast 'hard link' clones of lucene
indexes. Doing:

cp -lr data data.bak

will copy your index instantly. If you can avoid doing this when a
commit is happening, then you'll have a good index copy, that will take
no space on your disk and be made instantly. This is because it just
copies the directory structure, not the files themselves, and given
files in a lucene index never change (they are only ever deleted or
replaced), this works as a good copy technique for backing up.

Upayavira

On Thu, Dec 20, 2012, at 10:34 AM, Markus Jelsma wrote:
> You can use the replication handler to fetch a complete snapshot of the
> index over HTTP.
> http://wiki.apache.org/solr/SolrReplication#HTTP_API
>  
>  
> -Original message-
> > From:Andy D'Arcy Jewell 
> > Sent: Thu 20-Dec-2012 11:23
> > To: solr-user@lucene.apache.org
> > Subject: Pause and resume indexing on SolR 4 for backups
> > 
> > Hi all.
> > 
> > Can anyone advise me of a way to pause and resume SolR 4 so I can 
> > perform a backup? I need to be able to revert to a usable (though not 
> > necessarily complete) index after a crash or other "disaster" more 
> > quickly than a re-index operation would yield.
> > 
> > I can't yet afford the "extravagance" of a separate SolR replica just 
> > for backups, and I'm not sure if I'll ever have the luxury. I'm 
> > currently running with just one node, be we are not yet live.
> > 
> > I can think of the following ways to do this, each with various downsides:
> > 
> > 1) Just backup the existing index files whilst indexing continues
> >  + Easy
> >  + Fast
> >  - Incomplete
> >  - Potential for corruption? (e.g. partial files)
> > 
> > 2) Stop/Start Tomcat
> >  + Easy
> >  - Very slow and I/O, CPU intensive
> >  - Client gets errors when trying to connect
> > 
> > 3) Block/unblock SolR port with IpTables
> >  + Fast
> >  - Client gets errors when trying to connect
> >  - Have to wait for existing transactions to complete (not sure how, 
> > maybe watch socket FD's in /proc)
> > 
> > 4) Pause/Restart SolR service
> >  + Fast ? (hopefully)
> >  - Client gets errors when trying to connect
> > 
> > In any event, the web app will have to gracefully handle unavailability 
> > of SolR, probably by displaying a "down for maintenance" message, but 
> > this should preferably be only a very short amount of time.
> > 
> > Can anyone comment on my proposed solutions above, or provide any 
> > additional ones?
> > 
> > Thanks for any input you can provide!
> > 
> > -Andy
> > 
> > -- 
> > Andy D'Arcy Jewell
> > 
> > SysMicro Limited
> > Linux Support
> > E:  andy.jew...@sysmicro.co.uk
> > W:  www.sysmicro.co.uk
> > 
> >

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Gora Mohanty

On 20 December 2012 16:14, Andy D'Arcy Jewell
 wrote:
[...]
> It's attached to a web-app, which accepts uploads and will be available
> 24/7, with a global audience, so "pausing" it may be rather difficult (tho I
> may put this to the developer - it may for instance be possible if he has a
> small number of choke points for input into SolR).
[...]

It adds work for the web developer, but one could pause indexing,
put indexing requests into some kind of a queuing system, do the
backup, and flush the queue when the backup is done.

Regards,
Gora

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell


On 20/12/12 10:24, Gora Mohanty wrote:


Unless I am missing something, the index is only being written to
when you are adding/updating the index. So, the question is how
is this being done in your case, and could you pause indexing for
the duration of the backup?

Regards,
Gora
It's attached to a web-app, which accepts uploads and will be available 
24/7, with a global audience, so "pausing" it may be rather difficult 
(tho I may put this to the developer - it may for instance be possible 
if he has a small number of choke points for input into SolR).


Thanks.

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
T:  0844 9918804
M:  07961605631
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk

RE: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Markus Jelsma

You can use the replication handler to fetch a complete snapshot of the index 
over HTTP.
http://wiki.apache.org/solr/SolrReplication#HTTP_API
 
 
-Original message-
> From:Andy D'Arcy Jewell 
> Sent: Thu 20-Dec-2012 11:23
> To: solr-user@lucene.apache.org
> Subject: Pause and resume indexing on SolR 4 for backups
> 
> Hi all.
> 
> Can anyone advise me of a way to pause and resume SolR 4 so I can 
> perform a backup? I need to be able to revert to a usable (though not 
> necessarily complete) index after a crash or other "disaster" more 
> quickly than a re-index operation would yield.
> 
> I can't yet afford the "extravagance" of a separate SolR replica just 
> for backups, and I'm not sure if I'll ever have the luxury. I'm 
> currently running with just one node, be we are not yet live.
> 
> I can think of the following ways to do this, each with various downsides:
> 
> 1) Just backup the existing index files whilst indexing continues
>  + Easy
>  + Fast
>  - Incomplete
>  - Potential for corruption? (e.g. partial files)
> 
> 2) Stop/Start Tomcat
>  + Easy
>  - Very slow and I/O, CPU intensive
>  - Client gets errors when trying to connect
> 
> 3) Block/unblock SolR port with IpTables
>  + Fast
>  - Client gets errors when trying to connect
>  - Have to wait for existing transactions to complete (not sure how, 
> maybe watch socket FD's in /proc)
> 
> 4) Pause/Restart SolR service
>  + Fast ? (hopefully)
>  - Client gets errors when trying to connect
> 
> In any event, the web app will have to gracefully handle unavailability 
> of SolR, probably by displaying a "down for maintenance" message, but 
> this should preferably be only a very short amount of time.
> 
> Can anyone comment on my proposed solutions above, or provide any 
> additional ones?
> 
> Thanks for any input you can provide!
> 
> -Andy
> 
> -- 
> Andy D'Arcy Jewell
> 
> SysMicro Limited
> Linux Support
> E:  andy.jew...@sysmicro.co.uk
> W:  www.sysmicro.co.uk
> 
>

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Gora Mohanty

On 20 December 2012 15:46, Andy D'Arcy Jewell
 wrote:
> Hi all.
>
> Can anyone advise me of a way to pause and resume SolR 4 so I can perform a
> backup? I need to be able to revert to a usable (though not necessarily
> complete) index after a crash or other "disaster" more quickly than a
> re-index operation would yield.
[...]

Unless I am missing something, the index is only being written to
when you are adding/updating the index. So, the question is how
is this being done in your case, and could you pause indexing for
the duration of the backup?

Regards,
Gora

Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell


Hi all.

Can anyone advise me of a way to pause and resume SolR 4 so I can 
perform a backup? I need to be able to revert to a usable (though not 
necessarily complete) index after a crash or other "disaster" more 
quickly than a re-index operation would yield.


I can't yet afford the "extravagance" of a separate SolR replica just 
for backups, and I'm not sure if I'll ever have the luxury. I'm 
currently running with just one node, be we are not yet live.


I can think of the following ways to do this, each with various downsides:

1) Just backup the existing index files whilst indexing continues
+ Easy
+ Fast
- Incomplete
- Potential for corruption? (e.g. partial files)

2) Stop/Start Tomcat
+ Easy
- Very slow and I/O, CPU intensive
- Client gets errors when trying to connect

3) Block/unblock SolR port with IpTables
+ Fast
- Client gets errors when trying to connect
- Have to wait for existing transactions to complete (not sure how, 
maybe watch socket FD's in /proc)


4) Pause/Restart SolR service
+ Fast ? (hopefully)
- Client gets errors when trying to connect

In any event, the web app will have to gracefully handle unavailability 
of SolR, probably by displaying a "down for maintenance" message, but 
this should preferably be only a very short amount of time.


Can anyone comment on my proposed solutions above, or provide any 
additional ones?


Thanks for any input you can provide!

-Andy

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk

Re: Data from deleted from Solr (Solr cloud)

2012-12-20 Thread John Nielsen

Yeah, I ran into this issue myself with solr-4.0.0.

To fix it, I had to compile my own version from the solr-4x branch. That
is, I assume it's fixed as I have been unable to replicate it after the
switch.

I'm afraid you will have to reindex your data.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


On Wed, Dec 19, 2012 at 5:08 PM, shreejay  wrote:

> Hi All,
>
> I have a solrlcoud instance with 3 shards. Each shard has 2 instance (2
> servers each running a instance of solr)
>
> Lets say I had Instance1 and instance2 in shard1 … At some point, instance2
> went down due to OOM (out of memory) . instance1 for some reason was not
> replicating the data properly and when it became the leader, it had only
> around 1% of the data that instance2 had. I restarted instance2, and hoped
> that instance1 will replicate from 2, but instead instanace2 replicated
> from
> instance1 . and ended up deleting the original index folder it had. There
> were around 2 million documents in that instance.
>
> Can any one of solrlcoud users give any hints if I can recover this data?
>
>
>
>
> --Shreejay
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Data-from-deleted-from-Solr-Solr-cloud-tp4028055.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

50 matches

Mail list logo