Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-23 Thread pcrao
Hi Mikhail Khludnev, 

THank you for your help.

Let me explain you the scenario about JVM.
The JVM in which tomcat is running will not be restarted every time the
StreamingUpdateSolrServer
is running where as the EmbeddedSolrServer is a fresh JVM instance(new
process) every time.
In this scenario the index is being corrupted.

If I restart Tomcat(i.e. restart JVM in which StreamingupdateServer is
running) after each of the index
completion the index doesn't get corrupted. However, this is not a viable
option for us because Solr will
not be available to users during the restart.

Let me know if you have any more thoughts on this.
In case you dont, can you also let me know how can I seek help from others?

Thanks again,
PC Rao.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3931636.html
Sent from the Solr - User mailing list archive at Nabble.com.


Exception fixing docBase for context [error in opening zip file]

2012-04-23 Thread Yung-chung Lin
Hi,

I am experiencing a problem starting solr with Tomcat 6.

My system:  Ubuntu 11.

ii  tomcat66.0.32-5ubuntu1.2
Servlet and JSP engine
ii  openjdk-6-jre  6b23~pre11-0ubuntu1.11.10.2
OpenJDK Java runtime, using Hotspot JIT

I'm using the nightly build war
file: apache-solr-4.0-2012-04-21_08-25-44.war

Can anyone give me a pointer? Thanks.

Below is the error message I got.

2012/4/23 下午 02:24:42 org.apache.coyote.http11.Http11Protocol init
資訊: Initializing Coyote HTTP/1.1 on http-8080
2012/4/23 下午 02:24:42 org.apache.catalina.startup.Catalina load
資訊: Initialization processed in 575 ms
2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardService start
資訊: Starting service Catalina
2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardEngine start
資訊: Starting Servlet Engine: Apache Tomcat/6.0.32
2012/4/23 下午 02:24:42 org.apache.catalina.startup.HostConfig
deployDescriptor
資訊: Deploying configuration descriptor ROOT.xml
2012/4/23 下午 02:24:42 org.apache.catalina.startup.HostConfig
deployDescriptor
資訊: Deploying configuration descriptor solr.xml
2012/4/23 下午 02:24:42 org.apache.catalina.startup.ContextConfig init
嚴重的: Exception fixing docBase for context [/solr]
java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.init(ZipFile.java:131)
at java.util.jar.JarFile.init(JarFile.java:150)
 at java.util.jar.JarFile.init(JarFile.java:87)
at sun.net.www.protocol.jar.URLJarFile.init(URLJarFile.java:90)
 at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:66)
at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:86)
 at
sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
at
sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89)
 at org.apache.catalina.startup.ExpandWar.expand(ExpandWar.java:148)
at
org.apache.catalina.startup.ContextConfig.fixDocBase(ContextConfig.java:886)
 at org.apache.catalina.startup.ContextConfig.init(ContextConfig.java:1021)
at
org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:279)
 at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
at org.apache.catalina.core.StandardContext.init(StandardContext.java:5707)
 at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4449)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
 at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
 at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
 at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1315)
 at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1061)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
 at org.apache.catalina.core.StandardService.start(StandardService.java:525)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
 at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
 at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardContext
resourcesStart
嚴重的: Error starting static Resources
java.lang.IllegalArgumentException: Invalid or unreadable WAR file :
/home/yclin/Projects/search/search/solr/wars/apache-solr-4.0-2012-04-21_08-25-44.war
at
org.apache.naming.resources.WARDirContext.setDocBase(WARDirContext.java:130)
 at
org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4320)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4489)
 at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
 at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
 at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
at 

Re: # open files with SolrCloud

2012-04-23 Thread Sami Siren
On Sat, Apr 21, 2012 at 9:57 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 I can reproduce some kind of searcher leak issue here, even w/o
 SolrCloud, and I've opened
 https://issues.apache.org/jira/browse/SOLR-3392

With the fix integrated. I do not see the leaking problem anymore with
my setup so it seems to be working now.

--
 Sami Siren


Re: 'Error 404: missing core name in path' in Solr

2012-04-23 Thread Dan Tuffery
Looks like you need to select a core name on the admin UI before select
search. Have a look in the solr.xml file in your solr home directory, what
cores are defined?

Solr is expecting the core name in the URL:

http://localhost:8080/solr/CORENAME/admin/http://localhost:8080/solr/admin/



On Mon, Apr 23, 2012 at 12:58 AM, vasuj vasu.j...@live.in wrote:

 I http://lucene.472066.n3.nabble.com/file/n3931194/Screenshot_%2847%29.png
 used

 //server.deleteByQuery( *:* );// CAUTION: deletes everything!
 query in my solr indexing program. Since then i am receiving the error
 whenever , i go to

 http://localhost:8080/solr/admin/

 and press search with query string :

 The error is

 HTTP Status 400 - Missing solr core name in path

 type Status report

 message Missing solr core name in path

 description The request sent by the client was syntactically incorrect
 (Missing solr core name in path).

 Apache Tomcat/7.0.21

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Error-404-missing-core-name-in-path-in-Solr-tp3931194p3931194.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: 'Error 404: missing core name in path' in Solr

2012-04-23 Thread Jan Høydahl
Hi,

Perhaps your search server uses a multi core setup? In that case you need your 
core name as part of the URL
http://wiki.apache.org/solr/CoreAdmin#Example

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 23. apr. 2012, at 01:58, vasuj wrote:

 I http://lucene.472066.n3.nabble.com/file/n3931194/Screenshot_%2847%29.png 
 used
 
 //server.deleteByQuery( *:* );// CAUTION: deletes everything!
 query in my solr indexing program. Since then i am receiving the error
 whenever , i go to
 
 http://localhost:8080/solr/admin/
 
 and press search with query string :
 
 The error is
 
 HTTP Status 400 - Missing solr core name in path
 
 type Status report
 
 message Missing solr core name in path
 
 description The request sent by the client was syntactically incorrect
 (Missing solr core name in path).
 
 Apache Tomcat/7.0.21
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Error-404-missing-core-name-in-path-in-Solr-tp3931194p3931194.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: StandardTokenizer and domain names containing digits

2012-04-23 Thread Alex Willmer
Steven A Rowe sarowe at syr.edu writes:
 StandardTokenizer in Lucene/Solr v3.1+ implements the Word Boundary rules 
 from 
Unicode 6.0.0 Standard
 Annex #29, a.k.a. UAX#29: http://www.unicode.org/reports/tr29/tr29-
17.html#Word_Boundaries. 
 These rules don't include recognition of URLs or domain names.
 
 Lucene/Solr includes another tokenizer that does recognize URLs and domain 
names, in addition to the
 UAX#29 Word Boundary rules: UAX29URLEmailTokenizer
 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.UAX29URLEmailT
okenizerFactory.
  (Stand-alone domain names are recognized as URLs.)
 
 My suggestion is that you add a filter (for both the indexing and querying) 
that splits tokens containing
 periods:
 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterF
ilterFactory,
 something like (untested!):
 
 filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=0
 splitOnNumerics=0
 stemEnglishPossessive=0
 generateWordParts=1
 preserveOriginal=1 /

Steve, Thank you very much for this reply, it helped immensely. In the end I've 
gone for your suggestion, plus a swap of StandardTokenizer - 
UAX29URLEmailTokenizer and setting autoGeneratePhraseQueries=true. The 
fieldType now looks like

fieldType name=text_general class=solr.TextField 
positionIncrementGap=100 
autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.UAX29URLEmailTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange=1
splitOnNumerics=0
stemEnglishPossessive=0
generateWordParts=1
preserveOriginal=1 /
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true 
expand=false/
--
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.UAX29URLEmailTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange=1
splitOnNumerics=0
stemEnglishPossessive=0
generateWordParts=1
preserveOriginal=1 /
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

autoGeneratePhraseQueries is set so that the tokens generated in the query 
analyzer behave more like tokens from a space delimited query. So 
ns1.define.logica.com finds a similar set of documents to ns1 define logica 
com (i.e. ns1 AND define AND logica AND com), rather than ns1 OR define OR 
logica OR com. 

Many thanks, Alex



Re: Solr Hanging

2012-04-23 Thread Trym R. Møller

Hi

I have succeeded in reproducing the scenario with two Solr instances 
running. They cover a single collection with two slices and two replica, 
two cores in each Solr instance. I have changed the number of threads 
that Jetty is allowed to use as follows:

New class=org.mortbay.thread.QueuedThreadPool
Set name=minThreads3/Set
Set name=maxThreads3/Set
Set name=lowThreads0/Set
/New
And when indexing a single document this works fine but when 
concurrently indexing 10 documents, Solr frequently hangs.
I know that Jetty per default are allowed to use 10.000 threads, but in 
my other setup, all these 10.000 allowed thread are used on a single 
Solr instance (I have 7 Solr instances) after some days and the hanging 
scenario occurs.


I'm not sure if just adjusting the allowed number of threads are the 
best solution and would like to get some input as what to expect and if 
there are other things I can adjust.
My setup is as written before 7 Solr instances handling a single 
collection with 28 leaders and 28 replicas distributed fairly on the 
Solrs (8 cores on each Solr).


Thanks for any input.

Best regards Trym


Den 19-04-2012 14:36, Yonik Seeley skrev:

On Thu, Apr 19, 2012 at 4:25 AM, Trym R. Møllert...@sigmat.dk  wrote:

Hi

I am using Solr trunk and have 7 Solr instances running with 28 leaders and
28 replicas for a single collection.
After indexing a while (a couple of days) the solrs start hanging and doing
a thread dump on the jvm I see blocked threads like the following:
Thread 2369: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame;
information may be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
line=158 (Compiled frame)
 -
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
@bci=42, line=1987 (Compiled frame)
 - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=399
(Compiled frame)
 - java.util.concurrent.ExecutorCompletionService.take() @bci=4, line=164
(Compiled frame)
 - org.apache.solr.update.SolrCmdDistributor.checkResponses(boolean)
@bci=27, line=350 (Compiled frame)
 - org.apache.solr.update.SolrCmdDistributor.finish() @bci=18, line=98
(Compiled frame)
 - org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish()
@bci=4, line=299 (Compiled frame)
 - org.apache.solr.update.processor.DistributedUpdateProcessor.finish()
@bci=1, line=817 (Compiled frame)
...
 - org.mortbay.thread.QueuedThreadPool$PoolThread.run() @bci=25, line=582
(Interpreted frame)

I read the stack trace as my indexing client has indexed a document and this
Solr is now waiting for the replica? to respond before returning an answer
to the client.

Correct.  What's the full stack trace like on both a leader and replica?
We need to know what the replica is blocking on.

What version of trunk are you using?

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: Exception fixing docBase for context [error in opening zip file]

2012-04-23 Thread Yung-chung Lin
Hi,



I have figured out this on my own. It was just a stupid permission thing.
This error Exception fixing docBase for context java.util.zip.ZipException:
error in opening zip file can be fixed by changing the permission of parent
paths to 0755.

find PARENT_PATH -type d -exec chmod 0755 {} \;


Yung-chung Lin

2012/4/23 ☼ 林永忠 ☼ (Yung-chung Lin) henearkrx...@gmail.com

 Hi,

 I am experiencing a problem starting solr with Tomcat 6.

 My system:  Ubuntu 11.

 ii  tomcat66.0.32-5ubuntu1.2
 Servlet and JSP engine
 ii  openjdk-6-jre  6b23~pre11-0ubuntu1.11.10.2
 OpenJDK Java runtime, using Hotspot JIT

 I'm using the nightly build war
 file: apache-solr-4.0-2012-04-21_08-25-44.war

 Can anyone give me a pointer? Thanks.

 Below is the error message I got.

 2012/4/23 下午 02:24:42 org.apache.coyote.http11.Http11Protocol init
 資訊: Initializing Coyote HTTP/1.1 on http-8080
 2012/4/23 下午 02:24:42 org.apache.catalina.startup.Catalina load
 資訊: Initialization processed in 575 ms
 2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardService start
 資訊: Starting service Catalina
 2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardEngine start
 資訊: Starting Servlet Engine: Apache Tomcat/6.0.32
 2012/4/23 下午 02:24:42 org.apache.catalina.startup.HostConfig
 deployDescriptor
 資訊: Deploying configuration descriptor ROOT.xml
 2012/4/23 下午 02:24:42 org.apache.catalina.startup.HostConfig
 deployDescriptor
 資訊: Deploying configuration descriptor solr.xml
 2012/4/23 下午 02:24:42 org.apache.catalina.startup.ContextConfig init
 嚴重的: Exception fixing docBase for context [/solr]
 java.util.zip.ZipException: error in opening zip file
 at java.util.zip.ZipFile.open(Native Method)
  at java.util.zip.ZipFile.init(ZipFile.java:131)
 at java.util.jar.JarFile.init(JarFile.java:150)
  at java.util.jar.JarFile.init(JarFile.java:87)
 at sun.net.www.protocol.jar.URLJarFile.init(URLJarFile.java:90)
  at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:66)
 at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:86)
  at
 sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
 at
 sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89)
  at org.apache.catalina.startup.ExpandWar.expand(ExpandWar.java:148)
 at
 org.apache.catalina.startup.ContextConfig.fixDocBase(ContextConfig.java:886)
  at
 org.apache.catalina.startup.ContextConfig.init(ContextConfig.java:1021)
 at
 org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:279)
  at
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
 at org.apache.catalina.core.StandardContext.init(StandardContext.java:5707)
  at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4449)
 at
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
  at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
 at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
  at
 org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
 at
 org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
  at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
 at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1315)
  at
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
 at
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1061)
 at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
 at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
  at
 org.apache.catalina.core.StandardService.start(StandardService.java:525)
 at org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
  at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:616)
 at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
  at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
 2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardContext
 resourcesStart
 嚴重的: Error starting static Resources
 java.lang.IllegalArgumentException: Invalid or unreadable WAR file :
 /home/yclin/Projects/search/search/solr/wars/apache-solr-4.0-2012-04-21_08-25-44.war
 at
 org.apache.naming.resources.WARDirContext.setDocBase(WARDirContext.java:130)
  at
 org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4320)
 at
 

Facing problem to integrate UIMA in SOLR

2012-04-23 Thread dsy99
Hello all,
I am facing problem to integrate the UIMA in SOLR. 

I followed the following steps, provided in README file shipped along with
Uima to integrate it in Solr 

Step1. 
I set lib/ tags in solrconfig.xml appropriately to point the jar files. 

   lib dir=../../contrib/uima/lib /
   lib dir=../../dist/ regex=apache-solr-uima-\d.*\.jar /

Step2.
 modified my schema.xml adding the fields I wanted to  hold metadata
specifying proper values for type, indexed, stored and multiValued options
as follows: 

field name=language type=string indexed=true stored=true
required=false/
  field name=concept type=string indexed=true stored=true
multiValued=true required=false/
  field name=sentence type=text indexed=true stored=true
multiValued=true required=false /

Step3. 
modified my solrconfig.xml adding the following snippet: 

  updateRequestProcessorChain name=uima default=true
processor
class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory
  lst name=uimaConfig
lst name=runtimeParameters
  str name=keyword_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=concept_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=lang_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=cat_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=entities_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=oc_licenseIDVALID_OPENCALAIS_KEY/str
/lst
str
name=analysisEngine/org/apache/uima/desc/OverridingParamsExtServicesAE.xml/str

bool name=ignoreErrorstrue/bool

lst name=analyzeFields
  bool name=mergefalse/bool
  arr name=fields
strtext/str
  /arr
/lst
lst name=fieldMappings
  lst name=type
str
name=nameorg.apache.uima.alchemy.ts.concept.ConceptFS/str
lst name=mapping
  str name=featuretext/str
  str name=fieldconcept/str
/lst
  /lst
  lst name=type
str
name=nameorg.apache.uima.alchemy.ts.language.LanguageFS/str
lst name=mapping
  str name=featurelanguage/str
  str name=fieldlanguage/str
/lst
  /lst
  lst name=type
str name=nameorg.apache.uima.SentenceAnnotation/str
lst name=mapping
  str name=featurecoveredText/str
  str name=fieldsentence/str
/lst
  /lst
/lst
  /lst
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain

Step 4: 
And finally created a new UpdateRequestHandler with the following: 
  requestHandler name=/update class=solr.XmlUpdateRequestHandler
lst name=defaults
  str name=update.processoruima/str
/lst

Further I  indexed a word file called text.docx using the following command: 

curl
http://localhost:8983/solr/update/extract?fmap.content=contentliteral.id=doc47commit=true;
-F file=@test.docx

When I searched the same document with
http://localhost:8983/solr/select?q=id:doc47; command, got the following
result i.e. not getting the additional UIMA fields in the response. 

result name=response numFound=1 start=0
  doc
 str name=authordivakar/str
 arr name=content_type
str
  
application/vnd.openxmlformats-officedocument.wordprocessingml.document 
/str
 /arr
 str name=iddoc47/str
 date name=last_modified2012-04-18T14:19:00Z/date
  /doc
/result


Can anyone help to fix this problem.

With Regds  Thanks 
Divakar


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facing-problem-to-integrate-UIMA-in-SOLR-tp3932008p3932008.html
Sent from the Solr - User mailing list archive at Nabble.com.


Performance problem with DIH in solr 3.3

2012-04-23 Thread Pravin Agrawal
Hi All,

I am using Delta import handler(solr 3.3) to index data from my database (using 
19 tables)
  Total Number of solr documents that get created from these 19 table is 444
  Total number of request send to data source during clean full import is 91083.

 My problem is that, DIH makes too many calls and puts load on my database.
  1. Can we batch these calls ?
  2. Can we use view instead? If yes can I get some examples to use view with 
DIH
  3. What kind of locks SOLR DIH acquire while querying DB?

Note: we are using both Full-import and delta-import handler.

Thanks in advance
Pravin Agrawal

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: null pointer error with solr deduplication

2012-04-23 Thread Mark Miller
A better error would be nicer.

In the past, when I have had docs with the same id on multiple shards, I
never saw an NPE problem. A lot has changed since then though. I guess, to
me, checking if the id is stored sticks out a bit more. Roughly based on
the stacktrace, it looks to me like it's not finding an id value and that
is causing the NPE.

If it's a legit problem we should probably make a JIRA issue about
improving the error message you end up getting.

-- 
- Mark

http://www.lucidimagination.com

On Sat, Apr 21, 2012 at 5:21 AM, Alexander Aristov 
alexander.aris...@gmail.com wrote:

 Hi

 I might be wrong but it's your responsibility to put unique doc IDs across
 shards.

 read this page

 http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

 particualry

   - Documents must have a unique key and the unique key must be stored
   (stored=true in schema.xml)
   -

   *The unique key field must be unique across all shards.* If docs with
   duplicate unique keys are encountered, Solr will make an attempt to
 return
   valid results, but the behavior may be non-deterministic.

 So solr bahaves as it should :) _unexpectidly_

 But I agree in that sence that there must be no error especially such as
 NPE.

 Best Regards
 Alexander Aristov


 On 21 April 2012 03:42, Peter Markey sudoma...@gmail.com wrote:

  Hello,
 
  I have been trying out deduplication in solr by following:
  http://wiki.apache.org/solr/Deduplication. I have defined a signature
  field
  to hold the values of the signature created based on few other fields in
 a
  document and the idea seems to work like a charm in a single solr
 instance.
  But, when I have multiple cores and try to do a distributed search (
 
 
 Http://localhost:8080/solr/core0/select?q=*shards=localhost:8080/solr/dedupe,localhost:8080/solr/dedupe2facet=truefacet.field=doc_id
  )
  I get the error pasted below. While normal search (with just q) works
 fine,
  the facet/stats queries seem to be the culprit. The doc_id contains
  duplicate ids since I'm testing the same set of documents indexed in both
  the cores(dedupe, dedupe2). Any insights would be highly appreciated.
 
  Thanks
 
 
 
  20-Apr-2012 11:39:35 PM org.apache.solr.common.SolrException log
  SEVERE: java.lang.NullPointerException
  at
 
 
 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:887)
  at
 
 
 org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:633)
  at
 
 
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:612)
  at
 
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307)
  at
 
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
  at
 
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
  at
 
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  at
 
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
  at
 
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
  at
 
 
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
  at
 
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
  at
 
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
  at
  org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
  at
 
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
  at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
  at
 
 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
  at
 
 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
  at
 
 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
 



Synonyms file in solr

2012-04-23 Thread ggggGuys
I have some problems with the synonyms file, it seems i can't make it work
the way i'd want. 

Here is an exemple :

I have these words : cat, animal, dog, living thing, baby shark

if i search for animal OR animals, i'd like to have the results for cat,
animal, dog, baby shark as well as their plural cats, dogs, animals and baby
sharks.

if i search for cat, i only want the results with cat or cats. Same for dog.

if i search for living thing, i want the results with living thing, living
things, animal or animals. So no dogs, cats...

So the words are in a hierarchy : living thing(s) - animal(s) - [dog(s),
cat(s), baby shark(s)]

I've tried a lot of thing but i can't get the results i want and i really
need your help :-(


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonyms-file-in-solr-tp3931838p3931838.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Hanging

2012-04-23 Thread Mark Miller
Perhaps related is 
http://www.lucidimagination.com/search/document/6d0e168c82c86a38#45c945b2de6543f4

On Apr 23, 2012, at 5:37 AM, Trym R. Møller wrote:

 Hi
 
 I have succeeded in reproducing the scenario with two Solr instances running. 
 They cover a single collection with two slices and two replica, two cores in 
 each Solr instance. I have changed the number of threads that Jetty is 
 allowed to use as follows:
 New class=org.mortbay.thread.QueuedThreadPool
 Set name=minThreads3/Set
 Set name=maxThreads3/Set
 Set name=lowThreads0/Set
 /New
 And when indexing a single document this works fine but when concurrently 
 indexing 10 documents, Solr frequently hangs.
 I know that Jetty per default are allowed to use 10.000 threads, but in my 
 other setup, all these 10.000 allowed thread are used on a single Solr 
 instance (I have 7 Solr instances) after some days and the hanging scenario 
 occurs.
 
 I'm not sure if just adjusting the allowed number of threads are the best 
 solution and would like to get some input as what to expect and if there are 
 other things I can adjust.
 My setup is as written before 7 Solr instances handling a single collection 
 with 28 leaders and 28 replicas distributed fairly on the Solrs (8 cores on 
 each Solr).
 
 Thanks for any input.
 
 Best regards Trym
 
 
 Den 19-04-2012 14:36, Yonik Seeley skrev:
 On Thu, Apr 19, 2012 at 4:25 AM, Trym R. Møllert...@sigmat.dk  wrote:
 Hi
 
 I am using Solr trunk and have 7 Solr instances running with 28 leaders and
 28 replicas for a single collection.
 After indexing a while (a couple of days) the solrs start hanging and doing
 a thread dump on the jvm I see blocked threads like the following:
Thread 2369: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame;
 information may be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
 line=158 (Compiled frame)
 -
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
 @bci=42, line=1987 (Compiled frame)
 - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=399
 (Compiled frame)
 - java.util.concurrent.ExecutorCompletionService.take() @bci=4, line=164
 (Compiled frame)
 - org.apache.solr.update.SolrCmdDistributor.checkResponses(boolean)
 @bci=27, line=350 (Compiled frame)
 - org.apache.solr.update.SolrCmdDistributor.finish() @bci=18, line=98
 (Compiled frame)
 - org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish()
 @bci=4, line=299 (Compiled frame)
 - org.apache.solr.update.processor.DistributedUpdateProcessor.finish()
 @bci=1, line=817 (Compiled frame)
...
 - org.mortbay.thread.QueuedThreadPool$PoolThread.run() @bci=25, line=582
 (Interpreted frame)
 
 I read the stack trace as my indexing client has indexed a document and this
 Solr is now waiting for the replica? to respond before returning an answer
 to the client.
 Correct.  What's the full stack trace like on both a leader and replica?
 We need to know what the replica is blocking on.
 
 What version of trunk are you using?
 
 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10

- Mark Miller
lucidimagination.com













Re: Solr Hanging

2012-04-23 Thread Mark Miller
And see https://issues.apache.org/jira/browse/SOLR-683 as it also may be 
related or have helpful info...

On Apr 23, 2012, at 8:17 AM, Mark Miller wrote:

 Perhaps related is 
 http://www.lucidimagination.com/search/document/6d0e168c82c86a38#45c945b2de6543f4
 
 On Apr 23, 2012, at 5:37 AM, Trym R. Møller wrote:
 
 Hi
 
 I have succeeded in reproducing the scenario with two Solr instances 
 running. They cover a single collection with two slices and two replica, two 
 cores in each Solr instance. I have changed the number of threads that Jetty 
 is allowed to use as follows:
 New class=org.mortbay.thread.QueuedThreadPool
 Set name=minThreads3/Set
 Set name=maxThreads3/Set
 Set name=lowThreads0/Set
 /New
 And when indexing a single document this works fine but when concurrently 
 indexing 10 documents, Solr frequently hangs.
 I know that Jetty per default are allowed to use 10.000 threads, but in my 
 other setup, all these 10.000 allowed thread are used on a single Solr 
 instance (I have 7 Solr instances) after some days and the hanging scenario 
 occurs.
 
 I'm not sure if just adjusting the allowed number of threads are the best 
 solution and would like to get some input as what to expect and if there are 
 other things I can adjust.
 My setup is as written before 7 Solr instances handling a single collection 
 with 28 leaders and 28 replicas distributed fairly on the Solrs (8 cores on 
 each Solr).
 
 Thanks for any input.
 
 Best regards Trym
 
 
 Den 19-04-2012 14:36, Yonik Seeley skrev:
 On Thu, Apr 19, 2012 at 4:25 AM, Trym R. Møllert...@sigmat.dk  wrote:
 Hi
 
 I am using Solr trunk and have 7 Solr instances running with 28 leaders and
 28 replicas for a single collection.
 After indexing a while (a couple of days) the solrs start hanging and doing
 a thread dump on the jvm I see blocked threads like the following:
   Thread 2369: (state = BLOCKED)
- sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame;
 information may be imprecise)
- java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
 line=158 (Compiled frame)
-
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
 @bci=42, line=1987 (Compiled frame)
- java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=399
 (Compiled frame)
- java.util.concurrent.ExecutorCompletionService.take() @bci=4, line=164
 (Compiled frame)
- org.apache.solr.update.SolrCmdDistributor.checkResponses(boolean)
 @bci=27, line=350 (Compiled frame)
- org.apache.solr.update.SolrCmdDistributor.finish() @bci=18, line=98
 (Compiled frame)
- org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish()
 @bci=4, line=299 (Compiled frame)
- org.apache.solr.update.processor.DistributedUpdateProcessor.finish()
 @bci=1, line=817 (Compiled frame)
   ...
- org.mortbay.thread.QueuedThreadPool$PoolThread.run() @bci=25, line=582
 (Interpreted frame)
 
 I read the stack trace as my indexing client has indexed a document and 
 this
 Solr is now waiting for the replica? to respond before returning an answer
 to the client.
 Correct.  What's the full stack trace like on both a leader and replica?
 We need to know what the replica is blocking on.
 
 What version of trunk are you using?
 
 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 

- Mark Miller
lucidimagination.com













RE: StandardTokenizer and domain names containing digits

2012-04-23 Thread Steven A Rowe
Hi Alex,

Thanks for reporting back with concrete details of what worked for you - very 
helpful for others with similar projects.

Steve

-Original Message-
From: Alex Willmer [mailto:al.will...@logica.com] 
Sent: Monday, April 23, 2012 5:35 AM
To: solr-user@lucene.apache.org
Subject: Re: StandardTokenizer and domain names containing digits

Steven A Rowe sarowe at syr.edu writes:
 StandardTokenizer in Lucene/Solr v3.1+ implements the Word Boundary 
 rules from
Unicode 6.0.0 Standard
 Annex #29, a.k.a. UAX#29: http://www.unicode.org/reports/tr29/tr29-
17.html#Word_Boundaries. 
 These rules don't include recognition of URLs or domain names.
 
 Lucene/Solr includes another tokenizer that does recognize URLs and 
 domain
names, in addition to the
 UAX#29 Word Boundary rules: UAX29URLEmailTokenizer
 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.UAX29URLEmailT
okenizerFactory.
  (Stand-alone domain names are recognized as URLs.)
 
 My suggestion is that you add a filter (for both the indexing and 
 querying)
that splits tokens containing
 periods:
 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterF
ilterFactory,
 something like (untested!):
 
 filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=0
 splitOnNumerics=0
 stemEnglishPossessive=0
 generateWordParts=1
 preserveOriginal=1 /

Steve, Thank you very much for this reply, it helped immensely. In the end I've 
gone for your suggestion, plus a swap of StandardTokenizer - 
UAX29URLEmailTokenizer and setting autoGeneratePhraseQueries=true. The 
fieldType now looks like

fieldType name=text_general class=solr.TextField 
positionIncrementGap=100 
autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.UAX29URLEmailTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange=1
splitOnNumerics=0
stemEnglishPossessive=0
generateWordParts=1
preserveOriginal=1 /
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true 
expand=false/
--
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.UAX29URLEmailTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange=1
splitOnNumerics=0
stemEnglishPossessive=0
generateWordParts=1
preserveOriginal=1 /
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

autoGeneratePhraseQueries is set so that the tokens generated in the query 
analyzer behave more like tokens from a space delimited query. So 
ns1.define.logica.com finds a similar set of documents to ns1 define logica 
com (i.e. ns1 AND define AND logica AND com), rather than ns1 OR define OR 
logica OR com. 

Many thanks, Alex



Re: The index speed in the solr

2012-04-23 Thread Erick Erickson
Hard to say. Here's the basic approach I'd use to try to narrow it down:
1 take out ngrams. What does that do to your speed?
2 are you committing very often? Lengthen the time here if so.
3 Posting is probably not the more performant thing in world.
 Consider using SolrJ.
4 What does a document look like? Are they structured docs
 (Word, PDF, etc). If so, try offloading that to client machines.

Basically, you haven't given enough information to make much
of a guess here...

50 hours is a really long time for 2M docs though, so something
doesn't seem right unless the docs are really unusual.

If you need to offload the structured docs, here's a way to
get started:

http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

Best
Erick

On Sun, Apr 22, 2012 at 9:58 PM, neosky neosk...@yahoo.com wrote:
 It takes me 50 hours to index a total 9 G file(about 2,000,000 documents)
 with n-gram filter from min=6,max=10, my token before ngram filter is
 long(not a word, at most 300,000 bytes with white space). I split into 4
 files and use the post.sh to update at the same time. I also tried to write
 a lucene to do the index myself(single thread). The time is almost the same.
 I would like to know what's the general bottleneck for the index in solr?
 Doesn't the solr handle the index update request concurrently?

 1.
 Posting file /ngram_678910/file1.xml to http://localhost:8988/solr/update
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
 Current
                                 Dload  Upload   Total   Spent    Left
 Speed
  51 3005M    0     0   51 1557M      0  18902 46:19:14 23:59:46 22:19:28
 0
 2.
 Posting file /ngram_678910/file2.xml to http://localhost:8988/solr/update
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
 Current
                                 Dload  Upload   Total   Spent    Left
 Speed
  62 2623M    0     0   62 1632M      0  19839 38:31:16 23:58:01 14:33:15
 76629
 3.
 Posting file /ngram_678910/file3.xml to http://localhost:8988/solr/update
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
 Current
                                 Dload  Upload   Total   Spent    Left
 Speed
  65 2667M    0     0   65 1737M      0  21113 36:48:23 23:58:06 12:50:17
 25537
 4.
 Posting file /ngram_678910/file4.xml to http://localhost:8988/solr/update
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
 Current
                                 Dload  Upload   Total   Spent    Left
 Speed
  58 2766M    0     0   58 1625M      0  19752 40:47:34 23:58:28 16:49:06
 81435


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/The-index-speed-in-the-solr-tp3931338p3931338.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception

2012-04-23 Thread sivaprasad
Hi,

When i am trying to index 16 millions of documents using dataimport handler,
intermittently i am getting the below exception and the indexing get
stopped.

STACKTRACE:

java.io.EOFException: Can not read response from server. Expected to read 4
bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1997)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2411)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2916)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:885)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1360)
at com.mysql.jdbc.MysqlIO.fetchRowsViaCursor(MysqlIO.java:4044)
at
com.mysql.jdbc.CursorRowProvider.fetchMoreRows(CursorRowProvider.java:396)
at
com.mysql.jdbc.CursorRowProvider.hasNext(CursorRowProvider.java:313)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:7296)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:331)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$600(JdbcDataSource.java:228)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:262)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:77)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)


** END NESTED EXCEPTION **



Last packet sent to the server was 2 ms ago.
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2622)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2916)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:885)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1360)
at com.mysql.jdbc.MysqlIO.fetchRowsViaCursor(MysqlIO.java:4044)
at
com.mysql.jdbc.CursorRowProvider.fetchMoreRows(CursorRowProvider.java:396)
at
com.mysql.jdbc.CursorRowProvider.hasNext(CursorRowProvider.java:313)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:7296)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:331)
... 11 more

2012-04-23 08:25:35,693 SEVERE
[org.apache.solr.handler.dataimport.DataImporter] (Thread-21) Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
com.mysql.jdbc.CommunicationsException: Communications link failure due to
underlying exception:


And the db-config.xml has the below configuration.


dataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/phpq user=slrmgr  defaultFetchSize=30
useCursorFetch=true autoReconnect=true tcpKeepAlive=true
connectionTimeout=12 password=pqmgr123 batch-size=-1/

Any help on this is much appreciable.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Full-Import-failed-org-apache-solr-handler-dataimport-DataImportHandlerException-com-mysql-jdbc-Commn-tp3932521p3932521.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using two repeater to rapidly switching Master and Slave (Replication)?

2012-04-23 Thread Jeevanandam

On 23-04-2012 10:28 am, A Vorderegger wrote:

This setup would be highly convenient and perfect for the purpose of
failing over the Master role however it does not work for me. 
Resolving
http://slave_host:port/solr/replication?command=enablepoll I am met 
with:
str name=statusERROR/strstr name=messageNo slave 
configured/str
no matter what order I enable polling / replication in. I am 
confident that
I have setup my solrconfig.xml file exactly as described. Could you 
please
further describe how this setup is successfully achieved? Thanks in 
advance



can you please share your repeater configuration (just replication 
handler definition)?


It looks like, on slave host; master is enabled.

and on master executing enablepoll command, will result into

response
lst name=responseHeader
int name=status0/int
int name=QTime2/int
/lst
str name=statusERROR/str
str name=messageNo slave configured/str
/response


-Jeevanandam


Re: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception

2012-04-23 Thread Jeevanandam

On 23-04-2012 8:18 pm, sivaprasad wrote:

Hi,

When i am trying to index 16 millions of documents using dataimport 
handler,

intermittently i am getting the below exception and the indexing get
stopped.

STACKTRACE:

java.io.EOFException: Can not read response from server. Expected to 
read 4

bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1997)
at 
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2411)

at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2916)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:885)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1360)
at 
com.mysql.jdbc.MysqlIO.fetchRowsViaCursor(MysqlIO.java:4044)

at

com.mysql.jdbc.CursorRowProvider.fetchMoreRows(CursorRowProvider.java:396)
at
com.mysql.jdbc.CursorRowProvider.hasNext(CursorRowProvider.java:313)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:7296)
at

org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:331)
at

org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$600(JdbcDataSource.java:228)
at

org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:262)
at

org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:77)
at

org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
at

org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at

org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)
at

org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
at

org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
at

org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at

org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at

org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)


** END NESTED EXCEPTION **



Last packet sent to the server was 2 ms ago.
at 
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2622)

at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2916)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:885)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1360)
at 
com.mysql.jdbc.MysqlIO.fetchRowsViaCursor(MysqlIO.java:4044)

at

com.mysql.jdbc.CursorRowProvider.fetchMoreRows(CursorRowProvider.java:396)
at
com.mysql.jdbc.CursorRowProvider.hasNext(CursorRowProvider.java:313)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:7296)
at

org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:331)
... 11 more

2012-04-23 08:25:35,693 SEVERE
[org.apache.solr.handler.dataimport.DataImporter] (Thread-21) Full 
Import

failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
com.mysql.jdbc.CommunicationsException: Communications link failure 
due to

underlying exception:


And the db-config.xml has the below configuration.


dataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/phpq user=slrmgr  
defaultFetchSize=30

useCursorFetch=true autoReconnect=true tcpKeepAlive=true
connectionTimeout=12 password=pqmgr123 batch-size=-1/

Any help on this is much appreciable.


--
View this message in context:

http://lucene.472066.n3.nabble.com/Full-Import-failed-org-apache-solr-handler-dataimport-DataImportHandlerException-com-mysql-jdbc-Commn-tp3932521p3932521.html
Sent from the Solr - User mailing list archive at Nabble.com.




Sivaprasad,

just a clarification about batch size attribute, is it typo error or 
real in your db-config.xml


Supported attribute name is batchSize=-1 
(http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource)



-Jeevanandam





RE: Performance problem with DIH in solr 3.3

2012-04-23 Thread Dyer, James
See this page for an alternate way to use DIH for Delta updates that does not 
generate n+1 Selects:  
http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Pravin Agrawal [mailto:pravin_agra...@persistent.co.in] 
Sent: Monday, April 23, 2012 5:51 AM
To: solr-user@lucene.apache.org
Subject: Performance problem with DIH in solr 3.3

Hi All,

I am using Delta import handler(solr 3.3) to index data from my database (using 
19 tables)
  Total Number of solr documents that get created from these 19 table is 444
  Total number of request send to data source during clean full import is 91083.

 My problem is that, DIH makes too many calls and puts load on my database.
  1. Can we batch these calls ?
  2. Can we use view instead? If yes can I get some examples to use view with 
DIH
  3. What kind of locks SOLR DIH acquire while querying DB?

Note: we are using both Full-import and delta-import handler.

Thanks in advance
Pravin Agrawal

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: The index speed in the solr

2012-04-23 Thread Smiley, David W.

On Apr 23, 2012, at 9:27 AM, Erick Erickson wrote:

 50 hours is a really long time for 2M docs though, so something
 doesn't seem right unless the docs are really unusual.

Don't forget he's n-gramming ;-)  There's not much more demanding you could ask 
of text analysis except for throwing shingling in there too for good measure[*].

Neosky, you should consider using Solr trunk which has dramatic multithreaded 
indexing performance improvements if your hardware is capable.  If you try 
trunk, use a large ramBufferSizeMB (say 2GB worth), but if you stick with Solr 
3.x, use 1GB.  And finally, increasing your mergeFactor will increase indexing 
performance at the expense of search speed.  You could throw in an optimize at 
the very end with a maxSegments=10 or something to compensate.

~ David Smiley
[*] that was a joke

Spatial4j

2012-04-23 Thread Eric Grobler
Hello Solr Community,

We are interested in polygon spatial queries.
I believe that Spatial4j supports it.

Is there a solr branch available that includes Spatial4j?
Will this be part of a furure solr release?

Thank you.

Best Regards
Ericz


Re: Solr Core Admin Question on Trunk

2012-04-23 Thread Jamie Johnson
So I believe I see the reason now.  Basically in app.js we check to
see if there is more than 1 core deployed to decide if we show the
core admin or not.  I am not sure this is intended or not, but I would
think this isn't what we want the default action to be.  Shouldn't we
always show the core admin menu option so users can grow their solr
instances without having to execute the core admin commands from curl
or something?

On Mon, Apr 23, 2012 at 11:04 AM, Jamie Johnson jej2...@gmail.com wrote:
 I just updated to the latest Solr nightly build to address the issue
 Yonik fixed in 3392 and have noticed that I no longer have a core
 admin button in my admin interface.  What specifically controls if
 this is shown or not?

 I am also not ruling out the chance I've messed something up but I was
 wondering if there are a set of conditions that controls if this is
 shown or not.


Re: Spatial4j

2012-04-23 Thread Smiley, David W.
Ericz,

See this issue:  https://issues.apache.org/jira/browse/SOLR-3304
It's just a TODO issue right now but when it's completed, you'll be able to do 
polygon spatial queries.  All the software is written to do it right now but 
the missing Solr piece is temporarily at Spatial4j.com.  If you were to try to 
use it, you would need to build it as of the same date that the Lucene spatial 
module was added, in LUCENE-3795.  Also, FYI to do polygons, you need a 3rd 
party jar, JTS.

I'm working through a backlog of things to get to but will get to it.

~ David Smiley

On Apr 23, 2012, at 11:09 AM, Eric Grobler wrote:

 Hello Solr Community,
 
 We are interested in polygon spatial queries.
 I believe that Spatial4j supports it.
 
 Is there a solr branch available that includes Spatial4j?
 Will this be part of a furure solr release?
 
 Thank you.
 
 Best Regards
 Ericz



Re: Solr Core Admin Question on Trunk

2012-04-23 Thread Stefan Matheis
Jamie, right .. that makes sense. right now the core-admin will not work in 
singlecore-mode because we have no core-name there. 
https://issues.apache.org/jira/browse/SOLR-2605 should fix this, afterwards we 
can show the core-admin for every configuration. would you mind to open a 
ticket for that?



On Monday, April 23, 2012 at 5:25 PM, Jamie Johnson wrote:

 So I believe I see the reason now. Basically in app.js we check to
 see if there is more than 1 core deployed to decide if we show the
 core admin or not. I am not sure this is intended or not, but I would
 think this isn't what we want the default action to be. Shouldn't we
 always show the core admin menu option so users can grow their solr
 instances without having to execute the core admin commands from curl
 or something?
 
 On Mon, Apr 23, 2012 at 11:04 AM, Jamie Johnson jej2...@gmail.com 
 (mailto:jej2...@gmail.com) wrote:
  I just updated to the latest Solr nightly build to address the issue
  Yonik fixed in 3392 and have noticed that I no longer have a core
  admin button in my admin interface. What specifically controls if
  this is shown or not?
  
  I am also not ruling out the chance I've messed something up but I was
  wondering if there are a set of conditions that controls if this is
  shown or not.
 





solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-23 Thread geeky2
hello all,

enviornment: centOS and solr 3.5

i am attempting to set up replication betweeen two solr boxes (master and
slave).

i am getting the following in the logs on the slave box.

2012-04-23 10:54:59,985 SEVERE [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Master at:
http://someip:someport/somepath/somecore/admin/replication/ is not
available. Index fetch failed. Exception: Invalid version (expected 2, but
10) or the data in not in 'javabin' format

master jvm (jboss host) is being started like this:

-Denable.master=true

slave jvm (jboss host) is being started like this:

-Denable.slave=true

does anyone have any ideas?

i have done the following:

used curl http://someip:someport/somepath/somecore/admin/replication/ from
slave to successfully see master

used ping from slave to master

switched out the dns name for master to hard coded ip address

made sure i can see
http://someip:someport/somepath/somecore/admin/replication/ in a browser


this is my request handler - i am using the same config file on both the
master and slave - but sending in the appropriate switch on start up (per
the solr wiki page on replication)

lst name=master

  str name=enable${enable.master:false}/str
  str name=replicateAfterstartup/str
  str name=replicateAftercommit/str



  str name=confFilesschema.xml,stopwords.txt,elevate.xml/str

  str name=commitReserveDuration00:00:10/str
/lst

str name=maxNumberOfBackups1/str
lst name=slave

  str name=enable${enable.slave:false}/str
  str
name=masterUrlhttp://someip:someport/somecore/admin/replication//str

  str name=pollInterval00:00:20/str


  str name=compressioninternal/str

  str name=httpConnTimeout5000/str
  str name=httpReadTimeout1/str

/lst
  /requestHandler


any suggestions would be great

thank you,
mark



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3932921.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spatial4j

2012-04-23 Thread Eric Grobler
Hi David,

Thank you for the information.
I am glad to hear that is basically ready to be integrated into lucene.

Regarding your backlog, is it realistic to expect 3304 resolved before June?

Best Regards
Ericz

On Mon, Apr 23, 2012 at 4:38 PM, Smiley, David W. dsmi...@mitre.org wrote:

 Ericz,

 See this issue:  https://issues.apache.org/jira/browse/SOLR-3304
 It's just a TODO issue right now but when it's completed, you'll be able
 to do polygon spatial queries.  All the software is written to do it right
 now but the missing Solr piece is temporarily at Spatial4j.com.  If you
 were to try to use it, you would need to build it as of the same date that
 the Lucene spatial module was added, in LUCENE-3795.  Also, FYI to do
 polygons, you need a 3rd party jar, JTS.

 I'm working through a backlog of things to get to but will get to it.

 ~ David Smiley

 On Apr 23, 2012, at 11:09 AM, Eric Grobler wrote:

  Hello Solr Community,
 
  We are interested in polygon spatial queries.
  I believe that Spatial4j supports it.
 
  Is there a solr branch available that includes Spatial4j?
  Will this be part of a furure solr release?
 
  Thank you.
 
  Best Regards
  Ericz




Kernel methods in SOLR

2012-04-23 Thread Peyman Faratin
Hi

Has there been any work that tries to integrate Kernel methods [1] with SOLR? I 
am interested in using kernel methods to solve synonym, hyponym and polysemous 
(disambiguation) problems which SOLR's Vector space model (bag of words) does 
not capture. 

For example, imagine we have only 3 words in our corpus, puma, cougar and 
feline. The 3 words have obviously interdependencies (puma disambiguates to 
cougar, cougar and puma are instances of felines - hyponyms). Now, imagine 2 
docs, d1 and d2, that have the following TF-IDF vectors. 

 puma, cougar, feline
d1   =   [  2,0, 0]
d2   =   [  0,1, 0]

i.e. d1 has no mention of term cougar or feline and conversely, d2 has no 
mention of terms puma or feline. Hence under the vector approach d1 and d2 are 
not related at all (and each interpretation of the terms have a unique vector). 
Which is not what we want to conclude. 

What I need is to include a kernel matrix (as data) such as the following that 
captures these relationships:

   puma, cougar, feline
puma=   [  1,1, 0.4]
cougar  =   [  1,1, 0.4]
feline  =   [  0.4, 0.4, 1]

then recompute the TF-IDF vector as a product of (1) the original vector and 
(2) the kernel matrix, resulting in

 puma, cougar, feline
d1   =   [  2,2, 0.8]
d2   =   [  1,1, 0.4]

(note, the new vectors are much less sparse). 

I can solve this problem (inefficiently) at the application layer but I was 
wondering if there has been any attempts within the community to solve similar 
problems, efficiently without paying a hefty response time price?

thank you 

Peyman

[1] http://en.wikipedia.org/wiki/Kernel_methods

Re: Language Identification

2012-04-23 Thread Bai Shen
I was under the impression that solr does Tika and the language identifier
that Shuyo did.  The page at
http://wiki.apache.org/solr/LanguageDetectionlists them both.

processor 
class=org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory
processor 
class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory

Again, I'm just trying to understand why it was moved to solr.


On Fri, Apr 20, 2012 at 6:02 PM, Jan Høydahl jan@cominvent.com wrote:

 Hi,

 Solr just reuses Tika's language identifier. But you are of course free to
 do your language detection on the Nutch side if you choose and not invoke
 the one in Solr.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 20. apr. 2012, at 21:49, Bai Shen wrote:

  I'm working on using Shuyo's work to improve the language identification
 of
  our search.  Apparently, it's been moved from Nutch to Solr.  Is there a
  reason for this?
 
  http://code.google.com/p/language-detection/issues/detail?id=34
 
  I would prefer to have the processing done in Nutch as that has the
 benefit
  of more hardware and not interfering with Solr latency.
 
  Thanks.




Apache Tomcat 6 service terminated unexpectedly. It has done this 2 time(s).

2012-04-23 Thread Husain, Yavar

Solr 3.5 was not returning results. To my surprise Tomcat 6.x (64 bit) was not 
running on my Windows. There were absolutely no errors in the logs, no crash 
dumps nothing. I restarted it and everything seems to be fine now.

Went to the Windows Event viewer and exported the following information as it 
relates to Tomcat:

Level   Date and Time   Source  Event IDTask Category
Information  04/23/2012 8:51:58 AM  Service Control Manager 7036None
The Apache Tomcat 6 service entered the running state.
Error04/23/2012 4:17:12 AM  Service Control Manager 7034None
The Apache Tomcat 6 service terminated unexpectedly.  It has done this 2 
time(s).
Information  04/16/2012 3:13:15 PM  Service Control Manager 7036None
The Apache Tomcat 6 service entered the running state.
Error04/16/2012 1:12:47 PM  Service Control Manager 7034None
The Apache Tomcat 6 service terminated unexpectedly.  It has done this 1 
time(s).
Information  04/07/2012 10:02:25 PM Service Control Manager 7036None
The Apache Tomcat 6 service entered the running state.

It is a mystery for me as I dont have any errors in the Tomcat logs. How should 
I go ahead debugging this problem?

Any help would be appreciated.
**
 
This message may contain confidential or proprietary information intended only 
for the use of the 
addressee(s) named above or may contain information that is legally privileged. 
If you are 
not the intended addressee, or the person responsible for delivering it to the 
intended addressee, 
you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly 
prohibited. If you have received this message by mistake, please immediately 
notify us by 
replying to the message and delete the original message and any copies 
immediately thereafter. 

Thank you.- 
**
FAFLD



Re: # open files with SolrCloud

2012-04-23 Thread Gopal Patwa
Great! I am going to try new Solr 4 build from April 23rd

On Sun, Apr 22, 2012 at 11:35 PM, Sami Siren ssi...@gmail.com wrote:

 On Sat, Apr 21, 2012 at 9:57 PM, Yonik Seeley
 yo...@lucidimagination.com wrote:
  I can reproduce some kind of searcher leak issue here, even w/o
  SolrCloud, and I've opened
  https://issues.apache.org/jira/browse/SOLR-3392

 With the fix integrated. I do not see the leaking problem anymore with
 my setup so it seems to be working now.

 --
  Sami Siren



Re: Solr Core Admin Question on Trunk

2012-04-23 Thread Jamie Johnson
No problem, created this.

https://issues.apache.org/jira/browse/SOLR-3401 and related to 2605.

On Mon, Apr 23, 2012 at 11:39 AM, Stefan Matheis
matheis.ste...@googlemail.com wrote:
 Jamie, right .. that makes sense. right now the core-admin will not work in 
 singlecore-mode because we have no core-name there. 
 https://issues.apache.org/jira/browse/SOLR-2605 should fix this, afterwards 
 we can show the core-admin for every configuration. would you mind to open a 
 ticket for that?



 On Monday, April 23, 2012 at 5:25 PM, Jamie Johnson wrote:

 So I believe I see the reason now. Basically in app.js we check to
 see if there is more than 1 core deployed to decide if we show the
 core admin or not. I am not sure this is intended or not, but I would
 think this isn't what we want the default action to be. Shouldn't we
 always show the core admin menu option so users can grow their solr
 instances without having to execute the core admin commands from curl
 or something?

 On Mon, Apr 23, 2012 at 11:04 AM, Jamie Johnson jej2...@gmail.com 
 (mailto:jej2...@gmail.com) wrote:
  I just updated to the latest Solr nightly build to address the issue
  Yonik fixed in 3392 and have noticed that I no longer have a core
  admin button in my admin interface. What specifically controls if
  this is shown or not?
 
  I am also not ruling out the chance I've messed something up but I was
  wondering if there are a set of conditions that controls if this is
  shown or not.






Re: Spatial4j

2012-04-23 Thread David Smiley (@MITRE.org)
Yes, I definitely think so.  At a minimum, I expect there will at least be a
patch or built jar file for you to get going by 1 June.

-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial4j-tp3932748p3933368.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Deciding whether to stem at query time

2012-04-23 Thread Walter Underwood
There is a third approach. Create two fields and always query both of them, 
with the exact field given a higher weight. This works great and performs well.

It is what we did at Netflix and what I'm doing at Chegg.

wunder

On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote:

 So I just realized the other day that stemming basically happens at index
 time. If I'm understanding correctly, there's no way to allow a user to
 specify, at run time, whether to stem particular words or not based on a
 single index. I think there are two options, but I'd love to hear that I'm
 wrong:
 
 1.) Incrementally build up a white list of words that don't stem very well.
 To pick a random example out of the blue, light isn't super closely
 related to, lighter, so I might choose not to stem that. If I wanted to
 do this, I think (if I understand correctly), stemmerOverrideFilter would
 help me out with this. I'm not a big fan of this approach.
 
 2.) Index all the text in two fields, once with stemming and once without.
 Then build some kind of option into the UI for specifying whether to stem
 the words or not, and search the appropriate field. Unfortunately, this
 would roughly double the size of my index, and probably affect query times
 too. Plus, the UI would probably suck.
 
 Am I missing an option? Has anyone tried one of these approaches?
 
 Thanks!
 Andrew







Re: Spatial4j

2012-04-23 Thread Eric Grobler
Thank you David,

it is fantastic what people like you do for the Solr community.


On Mon, Apr 23, 2012 at 8:08 PM, David Smiley (@MITRE.org) 
dsmi...@mitre.org wrote:

 Yes, I definitely think so.  At a minimum, I expect there will at least be
 a
 patch or built jar file for you to get going by 1 June.

 -
  Author:
 http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Spatial4j-tp3932748p3933368.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Language Identification

2012-04-23 Thread Robert Muir
On Mon, Apr 23, 2012 at 1:27 PM, Bai Shen baishen.li...@gmail.com wrote:
 I was under the impression that solr does Tika and the language identifier
 that Shuyo did.  The page at
 http://wiki.apache.org/solr/LanguageDetectionlists them both.

 processor 
 class=org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory
 processor 
 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory

 Again, I'm just trying to understand why it was moved to solr.


Because it offers a number of features above Tika's implementation,
and is available under the Apache 2.0 License so we are free to do
that.

-- 
lucidimagination.com


RE: Apache Tomcat 6 service terminated unexpectedly. It has done this 2 time(s).

2012-04-23 Thread Husain, Yavar
I am sorry, i should have raised this issue on tomcat forums. However just was 
trying my luck here as it was indirectly related to solr.

From: Husain, Yavar
Sent: Monday, April 23, 2012 11:07 PM
To: solr-user@lucene.apache.org
Subject: Apache Tomcat 6 service terminated unexpectedly.  It has done this 2 
time(s).

Solr 3.5 was not returning results. To my surprise Tomcat 6.x (64 bit) was not 
running on my Windows. There were absolutely no errors in the logs, no crash 
dumps nothing. I restarted it and everything seems to be fine now.

Went to the Windows Event viewer and exported the following information as it 
relates to Tomcat:

Level   Date and Time   Source  Event IDTask Category
Information  04/23/2012 8:51:58 AM  Service Control Manager 7036None
The Apache Tomcat 6 service entered the running state.
Error04/23/2012 4:17:12 AM  Service Control Manager 7034None
The Apache Tomcat 6 service terminated unexpectedly.  It has done this 2 
time(s).
Information  04/16/2012 3:13:15 PM  Service Control Manager 7036None
The Apache Tomcat 6 service entered the running state.
Error04/16/2012 1:12:47 PM  Service Control Manager 7034None
The Apache Tomcat 6 service terminated unexpectedly.  It has done this 1 
time(s).
Information  04/07/2012 10:02:25 PM Service Control Manager 7036None
The Apache Tomcat 6 service entered the running state.

It is a mystery for me as I dont have any errors in the Tomcat logs. How should 
I go ahead debugging this problem?

Any help would be appreciated.
**
This message may contain confidential or proprietary information intended only 
for the use of the
addressee(s) named above or may contain information that is legally privileged. 
If you are
not the intended addressee, or the person responsible for delivering it to the 
intended addressee,
you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly
prohibited. If you have received this message by mistake, please immediately 
notify us by
replying to the message and delete the original message and any copies 
immediately thereafter.

Thank you.-
**
FAFLD



Re: null pointer error with solr deduplication

2012-04-23 Thread Peter Markey
Thanks for the response. Yes, I agree with you that I have to check for the
uniqueness of doc ids but our requirement is such that we need to send it
to solr and I know that solr discards duplicate documents and it does not
work fine when we manually create the unique id. But I just wanted to
report the error since in this scenario (i guess the components for
deduplication are pretty new), it would probably help the devs to make the
behavior more deterministic towards duplicate documents.

On Sat, Apr 21, 2012 at 2:21 AM, Alexander Aristov 
alexander.aris...@gmail.com wrote:

 Hi

 I might be wrong but it's your responsibility to put unique doc IDs across
 shards.

 read this page

 http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

 particualry

   - Documents must have a unique key and the unique key must be stored
   (stored=true in schema.xml)
   -

   *The unique key field must be unique across all shards.* If docs with
   duplicate unique keys are encountered, Solr will make an attempt to
 return
   valid results, but the behavior may be non-deterministic.

 So solr bahaves as it should :) _unexpectidly_

 But I agree in that sence that there must be no error especially such as
 NPE.

 Best Regards
 Alexander Aristov


 On 21 April 2012 03:42, Peter Markey sudoma...@gmail.com wrote:

  Hello,
 
  I have been trying out deduplication in solr by following:
  http://wiki.apache.org/solr/Deduplication. I have defined a signature
  field
  to hold the values of the signature created based on few other fields in
 a
  document and the idea seems to work like a charm in a single solr
 instance.
  But, when I have multiple cores and try to do a distributed search (
 
 
 Http://localhost:8080/solr/core0/select?q=*shards=localhost:8080/solr/dedupe,localhost:8080/solr/dedupe2facet=truefacet.field=doc_id
  )
  I get the error pasted below. While normal search (with just q) works
 fine,
  the facet/stats queries seem to be the culprit. The doc_id contains
  duplicate ids since I'm testing the same set of documents indexed in both
  the cores(dedupe, dedupe2). Any insights would be highly appreciated.
 
  Thanks
 
 
 
  20-Apr-2012 11:39:35 PM org.apache.solr.common.SolrException log
  SEVERE: java.lang.NullPointerException
  at
 
 
 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:887)
  at
 
 
 org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:633)
  at
 
 
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:612)
  at
 
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307)
  at
 
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
  at
 
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
  at
 
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  at
 
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
  at
 
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
  at
 
 
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
  at
 
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
  at
 
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
  at
  org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
  at
 
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
  at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
  at
 
 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
  at
 
 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
  at
 
 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
 



Re: FastVectorHighlighter - no highlights

2012-04-23 Thread Jeffrey Schmidt
This does not appear to be shingle specific.  A non-shingled field is also NOT 
highlighted in the same manner with FVH.  I can see in the timing information 
that it takes much longer to run FVH than no highlighting at all, so Solr must 
be doing something.  But why it just lists the document IDs and little or no 
field highlights is still a mystery.

Any ideas on where I should look in the configuration, parameters to try etc.?

Cheers,

Jeff

On Apr 19, 2012, at 7:51 AM, Jeff Schmidt wrote:

 I am using Solr 4.0, and debug=timing shows Solr spending the great majority 
 of its time in the HighlightComponent. It seemed logical to look into the 
 FastVectorHighlighter.  I does seem much faster, but on the other hand, I'm 
 not getting the highlights I need. :)
 
 I've seen references to FVH not supporting MultiTerm and (non-fixed sized) 
 ngrams.  I'm using edismax, and I don't know if a certain configuration of 
 that becomes multi term and that's my problem, or if the is something 
 completely different. I don't have ngrams, but I do shingle.  For the 
 examples below, I have these fields defined:
 
   field name=n_macromolecule_name type=text_lc_np_shingle 
 indexed=true stored=true multiValued=true termVectors=true 
 termPositions=true termOffsets=true /
   field name=n_protein_family type=text_lc_np_shingle indexed=true 
 stored=true multiValued=true termVectors=true termPositions=true 
 termOffsets=true /
   field name=n_pathway_name type=text_lc_np_shingle indexed=true 
 stored=true multiValued=true termVectors=true termPositions=true 
 termOffsets=true /
   field name=n_cellreg_regulated_by type=text_lc_np_shingle 
 indexed=true stored=true multiValued=true termVectors=true 
 termPositions=true termOffsets=true /
   field name=n_cellreg_disease type=text_lc_np_shingle 
 indexed=true stored=true multiValued=true termVectors=true 
 termPositions=true termOffsets=true /
   field name=n_macromolecule_summary type=text_lc_np_shingle 
 indexed=true stored=true multiValued=true termVectors=true 
 termPositions=true termOffsets=true/
 
 
 Note that all are both indexed and stored, multi-valued, and I have  
 termVectors=true termPositions=true termOffsets=true to enable FVH. 
 When I had missed that in a field, I could see the log indicating such and 
 reverting to the regular highlighter. I no longer see those messages.  All of 
 the above fields are of this type:
 
 !-- A text field that forces lowercase, removes punctuation and 
 generates shingles for phrase matching --
fieldType name=text_lc_np_shingle class=solr.TextField 
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
!-- strip punctuation --
filter class=solr.PatternReplaceFilterFactory
pattern=([\p{Punct}]) replacement= replace=all/
!-- Remove any 0-length tokens. --
filter class=solr.LengthFilterFactory min=1 max=100/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory maxShingleSize=4 
 outputUnigrams=true / 
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- strip punctuation --
filter class=solr.PatternReplaceFilterFactory
pattern=([\p{Punct}]) replacement= replace=all/
!-- Remove any 0-length tokens. --
filter class=solr.LengthFilterFactory min=1 max=100/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory maxShingleSize=4 
 outputUnigrams=false outputUnigramsIfNoShingles=true/
  /analyzer
/fieldType
 
 
 Using the standard highlight component, for the search term cancer (rows=2), 
 I get the highlights I've come to appreciate:
 
 lst name=highlighting
 lst name=ING:3lzx
 arr name=n_macromolecule_name
 strlt;span class=ingReasonTextgt;cancerlt;/spangt; 
 susceptibility candidate 1/str
 /arr
 arr name=n_protein_family
 strlt;span class=ingReasonTextgt;Cancerlt;/spangt; 
 susceptibility candidate 1/str
 /arr
 /lst
 lst name=ING:8lj
 arr name=n_macromolecule_name
 strbreast lt;span 
 class=ingReasonTextgt;cancerlt;/spangt; 2, early onset/str
 /arr
 arr name=n_pathway_name
 strHereditary Breast lt;span 
 class=ingReasonTextgt;Cancerlt;/spangt; Signaling/str
 /arr
 arr name=n_cellreg_regulated_by
 strprostate lt;span 
 class=ingReasonTextgt;cancerlt;/spangt; cells/str
 /arr
 arr name=n_cellreg_disease
 strbreast lt;span 
 class=ingReasonTextgt;cancerlt;/spangt;/str
  

Re: Language Identification

2012-04-23 Thread Jan Høydahl
I think nothing has moved. We just offer Solr users to do language detection 
inside of Solr, using any of these two libs. If you choose to do language 
detection on client side instead, using any of these, what is stopping you?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 23. apr. 2012, at 19:27, Bai Shen wrote:

 I was under the impression that solr does Tika and the language identifier
 that Shuyo did.  The page at
 http://wiki.apache.org/solr/LanguageDetectionlists them both.
 
 processor 
 class=org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory
 processor 
 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory
 
 Again, I'm just trying to understand why it was moved to solr.
 
 
 On Fri, Apr 20, 2012 at 6:02 PM, Jan Høydahl jan@cominvent.com wrote:
 
 Hi,
 
 Solr just reuses Tika's language identifier. But you are of course free to
 do your language detection on the Nutch side if you choose and not invoke
 the one in Solr.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com
 
 On 20. apr. 2012, at 21:49, Bai Shen wrote:
 
 I'm working on using Shuyo's work to improve the language identification
 of
 our search.  Apparently, it's been moved from Nutch to Solr.  Is there a
 reason for this?
 
 http://code.google.com/p/language-detection/issues/detail?id=34
 
 I would prefer to have the processing done in Nutch as that has the
 benefit
 of more hardware and not interfering with Solr latency.
 
 Thanks.
 
 



java 1.6 requirement not documented clearly?

2012-04-23 Thread jmlucjav
Both wiki http://wiki.apache.org/solr/SolrInstall and tutorial
http://lucene.apache.org/solr/api/doc-files/tutorial.html state java 1.5 is
required, but trying to run solr3.6 with java 1.5 was giving some cryptic
error to a colleague.

xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-1-6-requirement-not-documented-clearly-tp3933799p3933799.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: java 1.6 requirement not documented clearly?

2012-04-23 Thread Chris Hostetter
: Both wiki http://wiki.apache.org/solr/SolrInstall and tutorial
: http://lucene.apache.org/solr/api/doc-files/tutorial.html state java 1.5 is
: required, but trying to run solr3.6 with java 1.5 was giving some cryptic
: error to a colleague.

You'll have to be more specific about what you (or your colleague) were 
doing, and what error you got.

Solr 3.6 should work fine with Java 1.5



-Hoss


Re: Deciding whether to stem at query time

2012-04-23 Thread Michael Sokolov
Yes, and you might choose to use different options for different 
fields.  For dictionary searches, where users are searching for specific 
words, and a high degree of precision is called for, stemming is less 
helpful, but for full text searches, more so.


-Mike

On 4/23/2012 3:35 PM, Walter Underwood wrote:

There is a third approach. Create two fields and always query both of them, 
with the exact field given a higher weight. This works great and performs well.

It is what we did at Netflix and what I'm doing at Chegg.

wunder

On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote:


So I just realized the other day that stemming basically happens at index
time. If I'm understanding correctly, there's no way to allow a user to
specify, at run time, whether to stem particular words or not based on a
single index. I think there are two options, but I'd love to hear that I'm
wrong:

1.) Incrementally build up a white list of words that don't stem very well.
To pick a random example out of the blue, light isn't super closely
related to, lighter, so I might choose not to stem that. If I wanted to
do this, I think (if I understand correctly), stemmerOverrideFilter would
help me out with this. I'm not a big fan of this approach.

2.) Index all the text in two fields, once with stemming and once without.
Then build some kind of option into the UI for specifying whether to stem
the words or not, and search the appropriate field. Unfortunately, this
would roughly double the size of my index, and probably affect query times
too. Plus, the UI would probably suck.

Am I missing an option? Has anyone tried one of these approaches?

Thanks!
Andrew










Re: Deciding whether to stem at query time

2012-04-23 Thread Walter Underwood
Right. Stemming is less useful for author fields, you don't need to match bill 
gate or steve job.

Also, if you want to do fuzzy matching, you should only do that on the exact 
fields, not the stemmed fields.

wunder

On Apr 23, 2012, at 3:45 PM, Michael Sokolov wrote:

 Yes, and you might choose to use different options for different fields.  For 
 dictionary searches, where users are searching for specific words, and a high 
 degree of precision is called for, stemming is less helpful, but for full 
 text searches, more so.
 
 -Mike
 
 On 4/23/2012 3:35 PM, Walter Underwood wrote:
 There is a third approach. Create two fields and always query both of them, 
 with the exact field given a higher weight. This works great and performs 
 well.
 
 It is what we did at Netflix and what I'm doing at Chegg.
 
 wunder
 
 On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote:
 
 So I just realized the other day that stemming basically happens at index
 time. If I'm understanding correctly, there's no way to allow a user to
 specify, at run time, whether to stem particular words or not based on a
 single index. I think there are two options, but I'd love to hear that I'm
 wrong:
 
 1.) Incrementally build up a white list of words that don't stem very well.
 To pick a random example out of the blue, light isn't super closely
 related to, lighter, so I might choose not to stem that. If I wanted to
 do this, I think (if I understand correctly), stemmerOverrideFilter would
 help me out with this. I'm not a big fan of this approach.
 
 2.) Index all the text in two fields, once with stemming and once without.
 Then build some kind of option into the UI for specifying whether to stem
 the words or not, and search the appropriate field. Unfortunately, this
 would roughly double the size of my index, and probably affect query times
 too. Plus, the UI would probably suck.
 
 Am I missing an option? Has anyone tried one of these approaches?
 
 Thanks!
 Andrew
 
 
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: java 1.6 requirement not documented clearly?

2012-04-23 Thread jmlucjav
oh, then it should work with 1.5?? OK i know what happened then. I did not
see it happening myself, but he unzipped 3.6, started solr with the example
config and got the error. He had java1.5, so I told him to upgrade and it
worked, so I assumed Solr required 1.6

But this was in a linux box, so most probably java1.5 it was using was
GCJ...

thanks
xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-1-6-requirement-not-documented-clearly-tp3933799p3933920.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-23 Thread Erick Erickson
Hmmm, does your master have an index? In other words have you
added anything to it? I actually doubt that's an issue, but

An aside, a polling interval of 20 seconds is rather short, beware of
your autowarming time exceeding your index updates

But my _first_ guess is that somehow you're Solrs aren't the same
version or you have a foo'd index on your master.

Best
Erick

On Mon, Apr 23, 2012 at 12:10 PM, geeky2 gee...@hotmail.com wrote:
 hello all,

 enviornment: centOS and solr 3.5

 i am attempting to set up replication betweeen two solr boxes (master and
 slave).

 i am getting the following in the logs on the slave box.

 2012-04-23 10:54:59,985 SEVERE [org.apache.solr.handler.SnapPuller]
 (pool-12-thread-1) Master at:
 http://someip:someport/somepath/somecore/admin/replication/ is not
 available. Index fetch failed. Exception: Invalid version (expected 2, but
 10) or the data in not in 'javabin' format

 master jvm (jboss host) is being started like this:

 -Denable.master=true

 slave jvm (jboss host) is being started like this:

 -Denable.slave=true

 does anyone have any ideas?

 i have done the following:

 used curl http://someip:someport/somepath/somecore/admin/replication/ from
 slave to successfully see master

 used ping from slave to master

 switched out the dns name for master to hard coded ip address

 made sure i can see
 http://someip:someport/somepath/somecore/admin/replication/ in a browser


 this is my request handler - i am using the same config file on both the
 master and slave - but sending in the appropriate switch on start up (per
 the solr wiki page on replication)

    lst name=master

      str name=enable${enable.master:false}/str
      str name=replicateAfterstartup/str
      str name=replicateAftercommit/str



      str name=confFilesschema.xml,stopwords.txt,elevate.xml/str

      str name=commitReserveDuration00:00:10/str
    /lst

    str name=maxNumberOfBackups1/str
    lst name=slave

      str name=enable${enable.slave:false}/str
      str
 name=masterUrlhttp://someip:someport/somecore/admin/replication//str

      str name=pollInterval00:00:20/str


      str name=compressioninternal/str

      str name=httpConnTimeout5000/str
      str name=httpReadTimeout1/str

    /lst
  /requestHandler


 any suggestions would be great

 thank you,
 mark



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3932921.html
 Sent from the Solr - User mailing list archive at Nabble.com.