Re: Setting up SolrCloud 5.0.0 and ZooKeeper 3.4.6
Thanks Swaraj. It is working now, after I run without start, and changing the zookeeper port to 2888 instead. Regards, Edwin On 7 April 2015 at 14:59, Swaraj Kumar swaraj2...@gmail.com wrote: As per http://stackoverflow.com/questions/11765015/zookeeper-not-starting http://stackoverflow.com/questions/11765015/zookeeper-not-starting Running without start will fix this. One more change you need to do is Solr default runs on 8983 and you have used 8983 in zookeeper so start solr on different port. Regards, Swaraj Kumar Senior Software Engineer I MakeMyTrip.com Mob No- 9811774497 On Tue, Apr 7, 2015 at 9:42 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi Erick, I think I'll just setup the ZooKeeper server in standalone mode first, before I get more confused as I'm quite new to both Solr and ZooKeeper too. Better not to jump the gun. However, I face this error when I try to start it in standalone mode. 2015-04-07 11:59:51,789 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.NumberFormatException: For input string: C:\Users\edwin\zookeeper-3.4.6\bin\..\conf\zoo.cfg at java.lang.NumberFormatException.forInputString(Unknown Source) at java.lang.Integer.parseInt(Unknown Source) at java.lang.Integer.parseInt(Unknown Source) at org.apache.zookeeper.server.ServerConfig.parse(ServerConfig.java:60) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:83) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) 2015-04-07 11:59:51,796 [myid:] - INFO [main:ZooKeeperServerMain@55] - Usage: ZooKeeperServerMain configfile | port datadir [ticktime] [maxcnxns] I have the following information in my zoo.cfg: tickTime=2000 initLimit=10 syncLimit=5 dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\singleserver clientPort=8983 I got the same error even if I set the clientPort=2888. Regards, Edwin On 7 April 2015 at 11:26, Erick Erickson erickerick...@gmail.com wrote: Believe me, I'm no Zookeeper expert, but it looks to me like you're mixing Solr ports and Zookeeper ports. AFAIK, the two ports in the zoo.cfg file are exclusively for the Zookeeper instances to talk to each other. Zookeeper isn't aware that the listening nodes are Solr noodes, so putting Solr ports in there is confusing Zookeeper I'd guess. Assuming you're starting your three ZK instances on ports 2888, 2889 and 2890, I'd expect the proper ports are 2888:3888 2889:3889 2890:3890 But as I said I'm not a Zookeeper expert so beware.. Best, Erick On Mon, Apr 6, 2015 at 7:57 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, I'm using Solr 5.0.0 and ZooKeeper 3.4.6. I'm trying to set up a ZooKeeper with simulation of 3 servers, but they are all located on the same machine for testing purpose. In my zoo.cfg file, I have listed down the 3 servers to be as follows: server.1=localhost:8983:3888 server.2=localhost:8984:3889 server.3=localhost:8985:3890 Then I try to start Solr using the following command: bin/solr start -e cloud -z localhost:8983-noprompt However, I'm unable to establish a connection from my Solr to the ZooKeeper. Is this configuration possible, or is there anything which I missed out? Thank you in advance for your help. Regards, Edwin
Re: Solr 4.2.0 index corruption issue
HI Guys, Please can someone help out here to pin-point the issue..? Thanks Regards, Puneet On Mon, Apr 6, 2015 at 1:27 PM, Puneet Jain ja.pun...@gmail.com wrote: Hi Guys, I am using 4.2.0 since more than a year and since last October 2014 facing index corruption issue. However, now it is happening everyday and have to built a fresh index for the temporary fix. Please find the logs below where i can see an error while replicating data from master to slave and notice the index corruption issue at slave nodes: 2015-04-05 00:00:37,671 ERROR snapPuller-15-thread-1 [handler.SnapPuller] - Error closing the file stream: _1re_Lucene41_0.tim java.io.IOException: Input/output error at java.io.RandomAccessFile.close0(Native Method) at java.io.RandomAccessFile.close(RandomAccessFile.java:543) at org.apache.lucene.store.FSDirectory$FSIndexOutput.close(FSDirectory.java:494) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1223) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1117) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:744) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:398) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:281) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) Not getting exact solution for the same was thinking to upgrade to SOLR 4.7.0 as it uses new versions of httpcomponents and i thought that older version have some issues. Please can someone recommend what can be done to avoid the index corruption issue in SOLR 4.2.0. Thanks in advance..! Thanks Regards, Puneet
Re: Collapse and Expand behaviour on result with 1 document.
Hi Joel Is the number of documents info available when using collapse and expand parameters? I can't seem to find it in the return xml. I know the numFound in the the main result set (result maxScore=6.470696 name=response numFound=27 start=0) refer to the number of collapse groups. I need to issue another query without the collapse and expand parameters to get the total number of documents? Or is there any fieldor parameter that indicate the number of documents that can be return through 'fl' parameter? I am trying to display such info on the front-end, 571 led results from 240 suppliers. On 4/1/2015 7:05 PM, Joel Bernstein wrote: Exactly correct. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 1, 2015 at 5:44 AM, Derek Poh d...@globalsources.com wrote: Hi Joel Correct me if my understanding is wrong. Using supplier id as the field to collapse on. - If thecollapse group heads inthe main result set has only 1document in each group, the expanded section will be empty since there are no documents to expandfor each collapse group. - To render the page, I need to iterate the main result set. For each document I have to check if there is an expanded group with the same supplier id. - The facets counts is based on the number of collapse groupsin the main result set (result maxScore=6.470696 name=response numFound=27 start=0) -Derek On 3/31/2015 7:43 PM, Joel Bernstein wrote: The way that collapse/expand is designed to be used is as follows: The main result set will contain the collapsed group heads. The expanded section will contain the expanded groups for the page of results. To render the page you iterate the main result set. For each document check to see if there is an expanded group. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 31, 2015 at 7:37 AM, Joel Bernstein joels...@gmail.com wrote: You should be able to use collapse/expand with one result. Does the document in the main result set have group members that aren't being expanded? Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh d...@globalsources.com wrote: If I want to group the results (by a certain field) even if there is only 1 document, I should use the group parameter instead? The requirement is to group the result of product documents by their supplier id. group=truegroup.field=P_SupplierIdgroup.limit=5 Is it true that the performance of collapse is better than group parameter on large data set, say 10-20 million documents? -Derek On 3/31/2015 10:03 AM, Joel Bernstein wrote: The expanded section will only include groups that have expanded documents. So, if the document that in the main result set has no documents to expand, then this is working as expected. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh d...@globalsources.com wrote: Hi I have a query which return 1 document. When I add the collapse and expand parameters to it, expand=trueexpand.rows=5fq={!collapse%20field=P_SupplierId}, the expanded section is empty (lst name=expanded/). Is this the behaviour of collapse and expand parameters on result which contain only 1 document? -Derek
RE: How do I use CachedSqlEntityProcessor?
The conversation helps me understand Cached processor a lot. I'm working on DIH cache using MapDB as backed engine instead of default CachedSqlEntityProcessor -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4198037.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setting up SolrCloud 5.0.0 and ZooKeeper 3.4.6
As per http://stackoverflow.com/questions/11765015/zookeeper-not-starting http://stackoverflow.com/questions/11765015/zookeeper-not-starting Running without start will fix this. One more change you need to do is Solr default runs on 8983 and you have used 8983 in zookeeper so start solr on different port. Regards, Swaraj Kumar Senior Software Engineer I MakeMyTrip.com Mob No- 9811774497 On Tue, Apr 7, 2015 at 9:42 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi Erick, I think I'll just setup the ZooKeeper server in standalone mode first, before I get more confused as I'm quite new to both Solr and ZooKeeper too. Better not to jump the gun. However, I face this error when I try to start it in standalone mode. 2015-04-07 11:59:51,789 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.NumberFormatException: For input string: C:\Users\edwin\zookeeper-3.4.6\bin\..\conf\zoo.cfg at java.lang.NumberFormatException.forInputString(Unknown Source) at java.lang.Integer.parseInt(Unknown Source) at java.lang.Integer.parseInt(Unknown Source) at org.apache.zookeeper.server.ServerConfig.parse(ServerConfig.java:60) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:83) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) 2015-04-07 11:59:51,796 [myid:] - INFO [main:ZooKeeperServerMain@55] - Usage: ZooKeeperServerMain configfile | port datadir [ticktime] [maxcnxns] I have the following information in my zoo.cfg: tickTime=2000 initLimit=10 syncLimit=5 dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\singleserver clientPort=8983 I got the same error even if I set the clientPort=2888. Regards, Edwin On 7 April 2015 at 11:26, Erick Erickson erickerick...@gmail.com wrote: Believe me, I'm no Zookeeper expert, but it looks to me like you're mixing Solr ports and Zookeeper ports. AFAIK, the two ports in the zoo.cfg file are exclusively for the Zookeeper instances to talk to each other. Zookeeper isn't aware that the listening nodes are Solr noodes, so putting Solr ports in there is confusing Zookeeper I'd guess. Assuming you're starting your three ZK instances on ports 2888, 2889 and 2890, I'd expect the proper ports are 2888:3888 2889:3889 2890:3890 But as I said I'm not a Zookeeper expert so beware.. Best, Erick On Mon, Apr 6, 2015 at 7:57 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, I'm using Solr 5.0.0 and ZooKeeper 3.4.6. I'm trying to set up a ZooKeeper with simulation of 3 servers, but they are all located on the same machine for testing purpose. In my zoo.cfg file, I have listed down the 3 servers to be as follows: server.1=localhost:8983:3888 server.2=localhost:8984:3889 server.3=localhost:8985:3890 Then I try to start Solr using the following command: bin/solr start -e cloud -z localhost:8983-noprompt However, I'm unable to establish a connection from my Solr to the ZooKeeper. Is this configuration possible, or is there anything which I missed out? Thank you in advance for your help. Regards, Edwin
What is the best way of Indexing different formats of documents?
Hi, I am a newbie to SOLR and basically from database background. We have a requirement of indexing files of different formats (x12,edifact, csv,xml). The files which are inputted can be of any format and we need to do a content based search on it. From the web I understand we can use TIKA processor to extract the content and store it in SOLR. What I want to know is, is there any better approach for indexing files in SOLR ? Can we index the document through streaming directly from the Application ? If so what is the disadvantage of using it (against DIH which fetches from the database)? Could someone share me some insight on this ? ls there any web links which I can refer to get some idea on it ? Please do help. Thanks Sangeetha
Re: Collapse and Expand behaviour on result with 1 document.
I believe currently issuing another query will be necessary to get the count of the expanded result set. I think it does make sense to include this information as part of the ExpandComponent output. So feel free to create a jira ticket for this and we should be able to get this into a future release. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Apr 7, 2015 at 3:27 AM, Derek Poh d...@globalsources.com wrote: Hi Joel Is the number of documents info available when using collapse and expand parameters? I can't seem to find it in the return xml. I know the numFound in the the main result set (result maxScore=6.470696 name=response numFound=27 start=0) refer to the number of collapse groups. I need to issue another query without the collapse and expand parameters to get the total number of documents? Or is there any fieldor parameter that indicate the number of documents that can be return through 'fl' parameter? I am trying to display such info on the front-end, 571 led results from 240 suppliers. On 4/1/2015 7:05 PM, Joel Bernstein wrote: Exactly correct. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 1, 2015 at 5:44 AM, Derek Poh d...@globalsources.com wrote: Hi Joel Correct me if my understanding is wrong. Using supplier id as the field to collapse on. - If thecollapse group heads inthe main result set has only 1document in each group, the expanded section will be empty since there are no documents to expandfor each collapse group. - To render the page, I need to iterate the main result set. For each document I have to check if there is an expanded group with the same supplier id. - The facets counts is based on the number of collapse groupsin the main result set (result maxScore=6.470696 name=response numFound=27 start=0) -Derek On 3/31/2015 7:43 PM, Joel Bernstein wrote: The way that collapse/expand is designed to be used is as follows: The main result set will contain the collapsed group heads. The expanded section will contain the expanded groups for the page of results. To render the page you iterate the main result set. For each document check to see if there is an expanded group. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 31, 2015 at 7:37 AM, Joel Bernstein joels...@gmail.com wrote: You should be able to use collapse/expand with one result. Does the document in the main result set have group members that aren't being expanded? Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh d...@globalsources.com wrote: If I want to group the results (by a certain field) even if there is only 1 document, I should use the group parameter instead? The requirement is to group the result of product documents by their supplier id. group=truegroup.field=P_SupplierIdgroup.limit=5 Is it true that the performance of collapse is better than group parameter on large data set, say 10-20 million documents? -Derek On 3/31/2015 10:03 AM, Joel Bernstein wrote: The expanded section will only include groups that have expanded documents. So, if the document that in the main result set has no documents to expand, then this is working as expected. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh d...@globalsources.com wrote: Hi I have a query which return 1 document. When I add the collapse and expand parameters to it, expand=trueexpand.rows=5fq={!collapse%20field=P_SupplierId}, the expanded section is empty (lst name=expanded/). Is this the behaviour of collapse and expand parameters on result which contain only 1 document? -Derek
Re: What is the best way of Indexing different formats of documents?
You can always choose either DIH or /update/extract to index docs in solr. Now there are multiple benefits of DIH which I am listing below :- 1. Clean and update using a single command. 2. DIH also optimize indexing using optimize=true 3. You can do delta-import based on last index time where as in case of /update/extract you need to do manual operation in case of delta import. 4. You can use multiple entity processor and transformers in case of DIH which is very useful to index exact data you want. 5. Query parameter rows limits the num of records. Regards, Swaraj Kumar Senior Software Engineer I MakeMyTrip.com Mob No- 9811774497 On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com sangeetha.subraman...@gtnexus.com wrote: Hi, I am a newbie to SOLR and basically from database background. We have a requirement of indexing files of different formats (x12,edifact, csv,xml). The files which are inputted can be of any format and we need to do a content based search on it. From the web I understand we can use TIKA processor to extract the content and store it in SOLR. What I want to know is, is there any better approach for indexing files in SOLR ? Can we index the document through streaming directly from the Application ? If so what is the disadvantage of using it (against DIH which fetches from the database)? Could someone share me some insight on this ? ls there any web links which I can refer to get some idea on it ? Please do help. Thanks Sangeetha
Lucene indexWriter update does not affect Solr search
I implement a small code for the purpose of extracting some keywords out of Lucene index. I did implement that using search component. My problem is when I tried to update Lucene IndexWriter, Solr index which is placed on top of that, does not affect. As you can see I did the commit part. BooleanQuery query = new BooleanQuery(); for (String fieldName : keywordSourceFields) { TermQuery termQuery = new TermQuery(new Term(fieldName,N/A)); query.add(termQuery, Occur.MUST_NOT); } TermQuery termQuery=new TermQuery(new Term(keywordField, N/A)); query.add(termQuery, Occur.MUST); try { //Query q= new QueryParser(keywordField, new StandardAnalyzer()).parse(query.toString()); TopDocs results = searcher.search(query, maxNumDocs); ScoreDoc[] hits = results.scoreDocs; IndexWriter writer = getLuceneIndexWriter(searcher.getPath()); for (int i = 0; i hits.length; i++) { Document document = searcher.doc(hits[i].doc); ListString keywords = keyword.getKeywords(hits[i].doc); if(keywords.size()0) document.removeFields(keywordField); for (String word : keywords) { document.add(new StringField(keywordField, word, Field.Store.YES)); } String uniqueKey = searcher.getSchema().getUniqueKeyField().getName(); writer.updateDocument(new Term(uniqueKey, document.get(uniqueKey)), document); } writer.commit(); writer.forceMerge(1); writer.close(); } catch (IOException | SyntaxError e) { throw new RuntimeException(); } Please help me through solving this problem. -- A.Nazemian
Re: Lucene indexWriter update does not affect Solr search
What are you trying to do? A search component is not intended for updating the index, so it really doesn’t surprise me that you aren’t seeing updates. I’d suggest you describe the problem you are trying to solve before proposing solutions. Upayavira On Tue, Apr 7, 2015, at 01:32 PM, Ali Nazemian wrote: I implement a small code for the purpose of extracting some keywords out of Lucene index. I did implement that using search component. My problem is when I tried to update Lucene IndexWriter, Solr index which is placed on top of that, does not affect. As you can see I did the commit part. BooleanQuery query = new BooleanQuery(); for (String fieldName : keywordSourceFields) { TermQuery termQuery = new TermQuery(new Term(fieldName,N/A)); query.add(termQuery, Occur.MUST_NOT); } TermQuery termQuery=new TermQuery(new Term(keywordField, N/A)); query.add(termQuery, Occur.MUST); try { //Query q= new QueryParser(keywordField, new StandardAnalyzer()).parse(query.toString()); TopDocs results = searcher.search(query, maxNumDocs); ScoreDoc[] hits = results.scoreDocs; IndexWriter writer = getLuceneIndexWriter(searcher.getPath()); for (int i = 0; i hits.length; i++) { Document document = searcher.doc(hits[i].doc); ListString keywords = keyword.getKeywords(hits[i].doc); if(keywords.size()0) document.removeFields(keywordField); for (String word : keywords) { document.add(new StringField(keywordField, word, Field.Store.YES)); } String uniqueKey = searcher.getSchema().getUniqueKeyField().getName(); writer.updateDocument(new Term(uniqueKey, document.get(uniqueKey)), document); } writer.commit(); writer.forceMerge(1); writer.close(); } catch (IOException | SyntaxError e) { throw new RuntimeException(); } Please help me through solving this problem. -- A.Nazemian
Re: What is the best way of Indexing different formats of documents?
On Tue, Apr 7, 2015, at 11:48 AM, sangeetha.subraman...@gtnexus.com wrote: Hi, I am a newbie to SOLR and basically from database background. We have a requirement of indexing files of different formats (x12,edifact, csv,xml). The files which are inputted can be of any format and we need to do a content based search on it. From the web I understand we can use TIKA processor to extract the content and store it in SOLR. What I want to know is, is there any better approach for indexing files in SOLR ? Can we index the document through streaming directly from the Application ? If so what is the disadvantage of using it (against DIH which fetches from the database)? Could someone share me some insight on this ? ls there any web links which I can refer to get some idea on it ? Please do help. You can have Solr do the TIKA work for you, by posting to update/extract. See here: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika You can only post one document at a time, and you will have to provide extra metadata fields in the URL you post to (e.g. the document ID). If the extracting update handler can handle what you need, then you are good. Otherwise, you will want to write your own code to call Tika, then push the extracted content as a plain document. Solr is just an HTTP server, so your application can post binary files for Solr to ingest with Tika, or otherwise. Upayavira
Re: DictionaryCompoundWordTokenFilterFactory - Dictionary/Compound-Words File
Typo: *even when the user delimits with a space. (e.g. base ball should find baseball). Thanks, From: Mike L. javaone...@yahoo.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, April 7, 2015 9:05 AM Subject: DictionaryCompoundWordTokenFilterFactory - Dictionary/Compound-Words File Solr User Group - I have a case where I need to be able to search against compound words, even when the user delimits with a space. (e.g. baseball = base ball). I think I've solved this by creating a compound-words dictionary file containing the split words that I would want DictionaryCompoundWordTokenFilterFactory to split. base \n ball I also applied in the synonym file the following rule: baseball = base ball ( to allow baseball to also get a hit) filter class=solr.DictionaryCompoundWordTokenFilterFactory dictionary=compound-words.txt minWordSize=5 minSubwordSize=2 maxSubwordSize=15 onlyLongestMatch=true/ Two questions - If I could in advance figure out all the compound words I would want to split, would it be better (more reliable results) for me to maintain this compount-words file or would it be better to throw one of those open office dictionaries at it the filter? Also - Any better suggestions to dealing with this problem vs the one I described using both the dictionary filter and the synonym rule? Thanks in advance! Mike
Re: Lucene indexWriter update does not affect Solr search
I did some investigation and found out that the retrieving part of documents works fine while Solr did not restarted. But the searching part of documents did not work. After I restarted Solr it seems that the core corrupted and failed to start! Here is the corresponding log: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.init(SolrCore.java:896) at org.apache.solr.core.SolrCore.init(SolrCore.java:662) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:278) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:272) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1604) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1716) at org.apache.solr.core.SolrCore.init(SolrCore.java:868) ... 9 more Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in NRTCachingDirectory(MMapDirectory@C:\Users\Ali\workspace\lucene_solr_5_0_0\solr\server\solr\document\data\index lockFactory=org.apache.lucene.store.SimpleFSLockFactory@3bf76891; maxCacheMB=48.0 maxMergeSizeMB=4.0): files: [_2_Lucene50_0.doc, write.lock, _2_Lucene50_0.pos, _2.nvd, _2.fdt, _2_Lucene50_0.tim] at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:821) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:78) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:65) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:272) at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:115) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1573) ... 11 more 4/7/2015, 6:53:26 PM ERROR SolrIndexWriter SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! 4/7/2015, 6:53:26 PM ERROR SolrIndexWriter Error closing IndexWriter java.lang.NullPointerException at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2959) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2927) at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:965) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1010) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:130) at org.apache.solr.update.SolrIndexWriter.finalize(SolrIndexWriter.java:183) at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method) at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:101) at java.lang.ref.Finalizer.access$100(Finalizer.java:32) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:190) There for my guess would be problem with indexing the keywordField and also problem related to closing the IndexWriter. On Tue, Apr 7, 2015 at 6:13 PM, Ali Nazemian alinazem...@gmail.com wrote: Dear Upayavira, Hi, It is just the part of my code in which caused the problem. I know searchComponent is not for changing the index, but for the purpose of extracting document keywords I was forced to hack searchComponent for extracting keywords and putting them into index. For more information about why I chose searchComponent at the first place please follow this link: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201503.mbox/browser Best regards. On Tue, Apr 7, 2015 at 5:30 PM, Upayavira u...@odoko.co.uk wrote: What are you trying to do? A search component is not intended for updating the index, so it really doesn’t surprise me that you aren’t seeing updates. I’d suggest you describe the problem you are trying to solve before proposing solutions. Upayavira On Tue, Apr 7, 2015, at 01:32 PM, Ali Nazemian wrote: I implement a small code for the purpose of extracting some keywords out of Lucene index. I did implement that using search component. My problem is when I tried to update Lucene IndexWriter, Solr index which is placed on top of that, does not affect. As you can see I did the commit part. BooleanQuery query = new BooleanQuery(); for (String fieldName : keywordSourceFields) { TermQuery termQuery = new TermQuery(new Term(fieldName,N/A)); query.add(termQuery, Occur.MUST_NOT); } TermQuery termQuery=new TermQuery(new Term(keywordField, N/A)); query.add(termQuery, Occur.MUST); try { //Query q= new QueryParser(keywordField, new
Re: What is the best way of Indexing different formats of documents?
Well have indexed heterogeneous sources including a variety of NoSQL's, RDBMs and Rich Documents (PDF Word etc.) using SolrJ. The only prerequisite of using SolrJ is that you should have an API to fetch data from your data source (Say JDBC for RDBMS, Tika for extracting text content from rich documents etc.) than SolrJ is so damn great and simple. Its as simple as downloading the jar and few lines of code to send data to your solr server after pre-processing your data. More details here: http://lucidworks.com/blog/indexing-with-solrj/ https://wiki.apache.org/solr/Solrj http://www.solrtutorial.com/solrj-tutorial.html Cheers, Yavar On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com sangeetha.subraman...@gtnexus.com wrote: Hi, I am a newbie to SOLR and basically from database background. We have a requirement of indexing files of different formats (x12,edifact, csv,xml). The files which are inputted can be of any format and we need to do a content based search on it. From the web I understand we can use TIKA processor to extract the content and store it in SOLR. What I want to know is, is there any better approach for indexing files in SOLR ? Can we index the document through streaming directly from the Application ? If so what is the disadvantage of using it (against DIH which fetches from the database)? Could someone share me some insight on this ? ls there any web links which I can refer to get some idea on it ? Please do help. Thanks Sangeetha
Re: Lucene indexWriter update does not affect Solr search
Dear Upayavira, Hi, It is just the part of my code in which caused the problem. I know searchComponent is not for changing the index, but for the purpose of extracting document keywords I was forced to hack searchComponent for extracting keywords and putting them into index. For more information about why I chose searchComponent at the first place please follow this link: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201503.mbox/browser Best regards. On Tue, Apr 7, 2015 at 5:30 PM, Upayavira u...@odoko.co.uk wrote: What are you trying to do? A search component is not intended for updating the index, so it really doesn’t surprise me that you aren’t seeing updates. I’d suggest you describe the problem you are trying to solve before proposing solutions. Upayavira On Tue, Apr 7, 2015, at 01:32 PM, Ali Nazemian wrote: I implement a small code for the purpose of extracting some keywords out of Lucene index. I did implement that using search component. My problem is when I tried to update Lucene IndexWriter, Solr index which is placed on top of that, does not affect. As you can see I did the commit part. BooleanQuery query = new BooleanQuery(); for (String fieldName : keywordSourceFields) { TermQuery termQuery = new TermQuery(new Term(fieldName,N/A)); query.add(termQuery, Occur.MUST_NOT); } TermQuery termQuery=new TermQuery(new Term(keywordField, N/A)); query.add(termQuery, Occur.MUST); try { //Query q= new QueryParser(keywordField, new StandardAnalyzer()).parse(query.toString()); TopDocs results = searcher.search(query, maxNumDocs); ScoreDoc[] hits = results.scoreDocs; IndexWriter writer = getLuceneIndexWriter(searcher.getPath()); for (int i = 0; i hits.length; i++) { Document document = searcher.doc(hits[i].doc); ListString keywords = keyword.getKeywords(hits[i].doc); if(keywords.size()0) document.removeFields(keywordField); for (String word : keywords) { document.add(new StringField(keywordField, word, Field.Store.YES)); } String uniqueKey = searcher.getSchema().getUniqueKeyField().getName(); writer.updateDocument(new Term(uniqueKey, document.get(uniqueKey)), document); } writer.commit(); writer.forceMerge(1); writer.close(); } catch (IOException | SyntaxError e) { throw new RuntimeException(); } Please help me through solving this problem. -- A.Nazemian -- A.Nazemian
DictionaryCompoundWordTokenFilterFactory - Dictionary/Compound-Words File
Solr User Group - I have a case where I need to be able to search against compound words, even when the user delimits with a space. (e.g. baseball = base ball). I think I've solved this by creating a compound-words dictionary file containing the split words that I would want DictionaryCompoundWordTokenFilterFactory to split. base \n ball I also applied in the synonym file the following rule: baseball = base ball ( to allow baseball to also get a hit) filter class=solr.DictionaryCompoundWordTokenFilterFactory dictionary=compound-words.txt minWordSize=5 minSubwordSize=2 maxSubwordSize=15 onlyLongestMatch=true/ Two questions - If I could in advance figure out all the compound words I would want to split, would it be better (more reliable results) for me to maintain this compount-words file or would it be better to throw one of those open office dictionaries at it the filter? Also - Any better suggestions to dealing with this problem vs the one I described using both the dictionary filter and the synonym rule? Thanks in advance! Mike
Re: What is the best way of Indexing different formats of documents?
Sangeetha, You can also run Tika directly from data import handler, and Data Import Handler can be made to run several threads if you can partition the input documents by directory or database id. I've done 4 threads by having a base configuration that does an Oracle query like this: SELECT * (SELECT id, url, ..., Modulo(rowNum, 4) as threadid FROM ... WHERE ...) WHERE threadid = %d A bash/sed script writes several data import handler XML files. I can then index several threads at a time. Each of these threads can then use all the transformers, e.g. templateTransformer, etc. XML can be transformed via XSLT. The Data Import Handler has other entities that go out to the web and then index the document via Tika. If you are indexing generic HTML, you may want to figure out an approach to SOLR-3808 and SOLR-2250 - this can be resolved by recompiling Solr and Tika locally, because Boilerpipe has a bug that has been fixed, but not pushed to Maven Central. Without that, the ASF cannot include the fix, but distributions such as LucidWorks Solr Enterprise can. I can drop some configs into github.com if I clean them up to obfuscate host names, passwords, and such. On Tue, Apr 7, 2015 at 9:14 AM, Yavar Husain yavarhus...@gmail.com wrote: Well have indexed heterogeneous sources including a variety of NoSQL's, RDBMs and Rich Documents (PDF Word etc.) using SolrJ. The only prerequisite of using SolrJ is that you should have an API to fetch data from your data source (Say JDBC for RDBMS, Tika for extracting text content from rich documents etc.) than SolrJ is so damn great and simple. Its as simple as downloading the jar and few lines of code to send data to your solr server after pre-processing your data. More details here: http://lucidworks.com/blog/indexing-with-solrj/ https://wiki.apache.org/solr/Solrj http://www.solrtutorial.com/solrj-tutorial.html Cheers, Yavar On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com sangeetha.subraman...@gtnexus.com wrote: Hi, I am a newbie to SOLR and basically from database background. We have a requirement of indexing files of different formats (x12,edifact, csv,xml). The files which are inputted can be of any format and we need to do a content based search on it. From the web I understand we can use TIKA processor to extract the content and store it in SOLR. What I want to know is, is there any better approach for indexing files in SOLR ? Can we index the document through streaming directly from the Application ? If so what is the disadvantage of using it (against DIH which fetches from the database)? Could someone share me some insight on this ? ls there any web links which I can refer to get some idea on it ? Please do help. Thanks Sangeetha
Re: What is the best way of Indexing different formats of documents?
The disadvantages of DIH are 1 it's a black box, debugging it isn't easy 2 it puts all the work on the Solr node. Parsing documents in various forms can be pretty heavy-weight and steal cycles from indexing and searching. 2a the extracting request handler also puts all the load on Solr FWIW. Personally I prefer an external program (and I was gratified to see Yavar's reference to the indexing with SolrJ article...). But then I'm a Java programmer by training, so that seems easy... Best, Erick On Tue, Apr 7, 2015 at 7:41 AM, Dan Davis dansm...@gmail.com wrote: Sangeetha, You can also run Tika directly from data import handler, and Data Import Handler can be made to run several threads if you can partition the input documents by directory or database id. I've done 4 threads by having a base configuration that does an Oracle query like this: SELECT * (SELECT id, url, ..., Modulo(rowNum, 4) as threadid FROM ... WHERE ...) WHERE threadid = %d A bash/sed script writes several data import handler XML files. I can then index several threads at a time. Each of these threads can then use all the transformers, e.g. templateTransformer, etc. XML can be transformed via XSLT. The Data Import Handler has other entities that go out to the web and then index the document via Tika. If you are indexing generic HTML, you may want to figure out an approach to SOLR-3808 and SOLR-2250 - this can be resolved by recompiling Solr and Tika locally, because Boilerpipe has a bug that has been fixed, but not pushed to Maven Central. Without that, the ASF cannot include the fix, but distributions such as LucidWorks Solr Enterprise can. I can drop some configs into github.com if I clean them up to obfuscate host names, passwords, and such. On Tue, Apr 7, 2015 at 9:14 AM, Yavar Husain yavarhus...@gmail.com wrote: Well have indexed heterogeneous sources including a variety of NoSQL's, RDBMs and Rich Documents (PDF Word etc.) using SolrJ. The only prerequisite of using SolrJ is that you should have an API to fetch data from your data source (Say JDBC for RDBMS, Tika for extracting text content from rich documents etc.) than SolrJ is so damn great and simple. Its as simple as downloading the jar and few lines of code to send data to your solr server after pre-processing your data. More details here: http://lucidworks.com/blog/indexing-with-solrj/ https://wiki.apache.org/solr/Solrj http://www.solrtutorial.com/solrj-tutorial.html Cheers, Yavar On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com sangeetha.subraman...@gtnexus.com wrote: Hi, I am a newbie to SOLR and basically from database background. We have a requirement of indexing files of different formats (x12,edifact, csv,xml). The files which are inputted can be of any format and we need to do a content based search on it. From the web I understand we can use TIKA processor to extract the content and store it in SOLR. What I want to know is, is there any better approach for indexing files in SOLR ? Can we index the document through streaming directly from the Application ? If so what is the disadvantage of using it (against DIH which fetches from the database)? Could someone share me some insight on this ? ls there any web links which I can refer to get some idea on it ? Please do help. Thanks Sangeetha
Merge Two Fields in SOLR
Hi Group, I am not sure if we have any easy way to merge two fields data in One Field, the Copy field doesn’t works as it stores as Multivalued. Can someone suggest any workaround to achieve this Use Case? FirstName:ABC SurName:XYZ I need an Another Field with Name:ABCXYZ where I have to do at SOLR END.. as the Source Data is read only and no control to comibine. Thanks Ravi
Re: Problem with new solr.xml format and core swaps
Shawn: I'm pretty clueless why you would be seeing this, and slammed with other stuff so I can't dig into this right now. What do the core.properties files look like when you see this? They should be re-written when you swap cores. Hmmm, I wonder if there's some condition where the files are already open and the persistence fails? If so we should be logging that error, I have no proof either way whether we are or not though. Guessing that your log files in the problem case weren't all that helpful, but let's have a look at them if this occurs again? Sorry I can't be more help Erick On Mon, Apr 6, 2015 at 8:38 PM, Shawn Heisey apa...@elyograg.org wrote: On 4/6/2015 6:40 PM, Erick Erickson wrote: What version are you migrating _from_? 4.9.0? There were some persistence issues at one point, but AFAIK they were fixed by 4.9, I can check if you're on an earlier version... Effectively there is no previous version. Whenever I upgrade, I delete all the data directories and completely reindex. When I converted from the old solr.xml to core discovery, the server was already on 4.9.1. Thanks, Shawn
Re: Trouble GetSpans lucene 4
Up. Anyone? Best regards. On 6 avr. 2015, at 21:32, Test Test andymish...@yahoo.fr wrote: Hi, I'm working on TamingText's book.I try to upgrade the code from solr 3.6 to solr 4.10.2.At the moment, i have a problem about the method getSpans.spans.next() returns always false.Anyone can helps? SpanNearQuery sQuery = (SpanNearQuery) origQuery;SolrIndexSearcher searcher = rb.req.getSearcher();IndexReader reader = searcher.getIndexReader();//AtomicReader wrapper = SlowCompositeReaderWrapper.wrap(reader);MapTerm, TermContext termContexts = new HashMapTerm, TermContext();//Spans spans = sQuery.getSpans(wrapper.getContext(), new Bits.MatchAllBits(reader.numDocs()), termContexts);while (spans.next() == true) {//} Thanks.Regards.
RE: Trouble GetSpans lucene 4
What class is origQuery? You will have to do more rewriting/calculation if you're trying to convert a PhraseQuery to a SpanNearQuery. If you dig around in org.apache.lucene.search.highlight.WeightedSpanTermExtractor in the Lucene highlighter package, you might get some inspiration. I have a hack for converting regular queries to SpanQueries here (this is largely based on WeightedSpanTermExtractor): https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/main/java/org/apache/lucene/search/spans/SimpleSpanQueryConverter.java -Original Message- From: Compte Poubelle [mailto:andymish...@yahoo.fr] Sent: Tuesday, April 07, 2015 1:53 PM To: solr-user@lucene.apache.org Subject: Re: Trouble GetSpans lucene 4 Up. Anyone? Best regards. On 6 avr. 2015, at 21:32, Test Test andymish...@yahoo.fr wrote: Hi, I'm working on TamingText's book.I try to upgrade the code from solr 3.6 to solr 4.10.2.At the moment, i have a problem about the method getSpans.spans.next() returns always false.Anyone can helps? SpanNearQuery sQuery = (SpanNearQuery) origQuery;SolrIndexSearcher searcher = rb.req.getSearcher();IndexReader reader = searcher.getIndexReader();//AtomicReader wrapper = SlowCompositeReaderWrapper.wrap(reader);MapTerm, TermContext termContexts = new HashMapTerm, TermContext();//Spans spans = sQuery.getSpans(wrapper.getContext(), new Bits.MatchAllBits(reader.numDocs()), termContexts);while (spans.next() == true) {//} Thanks.Regards.
Re: Merge Two Fields in SOLR
Ravi, what about using field aliasing at search time? Would that do the trick for your use case? http://localhost:8983/solr/mycollection/select?defType=edismaxq=name:john doef.name.qf=firstname surname For more details: https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser Damien On 04/07/2015 10:21 AM, Erick Erickson wrote: I don't understand why copyField doesn't work. Admittedly the firstName and SurName would be separate tokens, but isn't that what you want? The fact that it's multiValued isn't really a problem, multiValued fields are really functionally identical to single valued fields if you set positionIncrementGap to... hmmm.. 1 or 0 I'm not quite sure which. Of course if your'e sorting by the field, that's a different story. Here's a discussion with several options, but I really wonder what your specific objection to copyField is, it's the simplest and on the surface it seems like it would work. http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-td4086786.html Best, Erick On Tue, Apr 7, 2015 at 10:08 AM, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote: Hi Group, I am not sure if we have any easy way to merge two fields data in One Field, the Copy field doesn’t works as it stores as Multivalued. Can someone suggest any workaround to achieve this Use Case? FirstName:ABC SurName:XYZ I need an Another Field with Name:ABCXYZ where I have to do at SOLR END.. as the Source Data is read only and no control to comibine. Thanks Ravi
Re: Merge Two Fields in SOLR
I don't understand why copyField doesn't work. Admittedly the firstName and SurName would be separate tokens, but isn't that what you want? The fact that it's multiValued isn't really a problem, multiValued fields are really functionally identical to single valued fields if you set positionIncrementGap to... hmmm.. 1 or 0 I'm not quite sure which. Of course if your'e sorting by the field, that's a different story. Here's a discussion with several options, but I really wonder what your specific objection to copyField is, it's the simplest and on the surface it seems like it would work. http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-td4086786.html Best, Erick On Tue, Apr 7, 2015 at 10:08 AM, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote: Hi Group, I am not sure if we have any easy way to merge two fields data in One Field, the Copy field doesn’t works as it stores as Multivalued. Can someone suggest any workaround to achieve this Use Case? FirstName:ABC SurName:XYZ I need an Another Field with Name:ABCXYZ where I have to do at SOLR END.. as the Source Data is read only and no control to comibine. Thanks Ravi
Re: Problem with new solr.xml format and core swaps
On 4/7/2015 10:54 AM, Erick Erickson wrote: I'm pretty clueless why you would be seeing this, and slammed with other stuff so I can't dig into this right now. What do the core.properties files look like when you see this? They should be re-written when you swap cores. Hmmm, I wonder if there's some condition where the files are already open and the persistence fails? If so we should be logging that error, I have no proof either way whether we are or not though. Guessing that your log files in the problem case weren't all that helpful, but let's have a look at them if this occurs again? I hadn't had a chance to review the logs, but when I did just now, I found this: ERROR - 2015-04-07 11:56:15.568; org.apache.solr.core.CorePropertiesLocator; Couldn't persist core properties to /index/solr4/cores/sparkinc_0/core.properties: java.io.FileNotFoundException: /index/solr4/cores/sparkinc_0/core.properties (Permission denied) That's fairly clear. I guess my permissions were wrong. My best guess as to why -- things owned by root from when I created the core.properties files. Solr does not run as root. I didn't think to actually look at the permissions before I ran a script that I maintain which fixes all the ownership on my various directories involved in my full search installation. I don't think this explains the not-deleted segment files problem. Those segment files were written by solr running as the regular user, so there couldn't have been a permission problem. Thanks, Shawn
Re: Trouble GetSpans lucene 4
Re, origQuery is a Query object, i got it from a ResponseBuilder object, passed by the method getQuery. ResponseBuilder rb // it's method parameterQuery origQuery = rb.getQuery(); Thanks for the link, i'll keep you informed. Regards,Andy Le Mardi 7 avril 2015 20h26, Allison, Timothy B. talli...@mitre.org a écrit : What class is origQuery? You will have to do more rewriting/calculation if you're trying to convert a PhraseQuery to a SpanNearQuery. If you dig around in org.apache.lucene.search.highlight.WeightedSpanTermExtractor in the Lucene highlighter package, you might get some inspiration. I have a hack for converting regular queries to SpanQueries here (this is largely based on WeightedSpanTermExtractor): https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/main/java/org/apache/lucene/search/spans/SimpleSpanQueryConverter.java -Original Message- From: Compte Poubelle [mailto:andymish...@yahoo.fr] Sent: Tuesday, April 07, 2015 1:53 PM To: solr-user@lucene.apache.org Subject: Re: Trouble GetSpans lucene 4 Up. Anyone? Best regards. On 6 avr. 2015, at 21:32, Test Test andymish...@yahoo.fr wrote: Hi, I'm working on TamingText's book.I try to upgrade the code from solr 3.6 to solr 4.10.2.At the moment, i have a problem about the method getSpans.spans.next() returns always false.Anyone can helps? SpanNearQuery sQuery = (SpanNearQuery) origQuery;SolrIndexSearcher searcher = rb.req.getSearcher();IndexReader reader = searcher.getIndexReader();//AtomicReader wrapper = SlowCompositeReaderWrapper.wrap(reader);MapTerm, TermContext termContexts = new HashMapTerm, TermContext();//Spans spans = sQuery.getSpans(wrapper.getContext(), new Bits.MatchAllBits(reader.numDocs()), termContexts);while (spans.next() == true) {//} Thanks.Regards.
Re: Config join parse in solrconfig.xml
Cool. It actually works after I removed those extra columns. Thanks for your help. On Mon, Apr 6, 2015 at 8:19 PM, Erick Erickson erickerick...@gmail.com wrote: df does not allow multiple fields, it stands for default field, not default fields. To get what you're looking for, you need to use edismax or explicitly create the multiple clauses. I'm not quite sure what the join parser is doing with the df parameter. So my first question is what happens if you just use a single field for df?. Best, Erick On Mon, Apr 6, 2015 at 11:51 AM, Frank li fudon...@gmail.com wrote: The error message was from the query with debug=query. On Mon, Apr 6, 2015 at 11:49 AM, Frank li fudon...@gmail.com wrote: Hi Erick, Thanks for your response. Here is the query I am sending: http://dev-solr:8080/solr/collection1/select?q={!join+from=litigation_id_ls+to=lit_id_lms}all_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0 http://dev-solr:8080/solr/collection1/select?q=%7B!join+from=litigation_id_ls+to=lit_id_lms%7Dall_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0 You can see it has all_text:apple. I added field name all_text, because it gives error without it. Errors: lst name=errorstr name=msgundefined field all_text number party name all_code ent_name/strint name=code400/int/lst These fields are defined as the default search fields in our solr_config.xml file: str name=dfall_text number party name all_code ent_name/str Thanks, Fudong On Fri, Apr 3, 2015 at 1:31 PM, Erick Erickson erickerick...@gmail.com wrote: You have to show us several more things: 1 what exactly does the query look like? 2 what do you expect? 3 output when you specify debug=query 4 anything else that would help. You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Fri, Apr 3, 2015 at 10:58 AM, Frank li fudon...@gmail.com wrote: Hi, I am starting using join parser with our solr. We have some default fields. They are defined in solrconfig.xml: lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str int name=rows10/int str name=dfall_text number party name all_code ent_name/str str name=qfall_text number^3 name^5 party^3 all_code^2 ent_name^7/str str name=flid description market_sector_type parent ult_parent ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s *_ss *_ds *_sms *_ss *_bs/str str name=q.opAND/str /lst I found out once I use join parser, it does not recognize the default fields any more. How do I modify the configuration for this? Thanks, Fred
RE: Trouble GetSpans lucene 4
Oh, ok, if that's just a regular query, you will need to convert it to a SpanQuery, and you may need to rewrite the SpanQuery after conversion. If you're trying to do a concordance or trying to retrieve windows around the hits, take a look at ConcordanceSearcher within: https://github.com/tballison/lucene-addons/tree/master/lucene-5317 . With any luck, I should find the time to get back to the Solr wrapper under solr-5411 that Jason Robinson initially developed. -Original Message- From: Test Test [mailto:andymish...@yahoo.fr] Sent: Tuesday, April 07, 2015 3:51 PM To: solr-user@lucene.apache.org Subject: Re: Trouble GetSpans lucene 4 Re, origQuery is a Query object, i got it from a ResponseBuilder object, passed by the method getQuery. ResponseBuilder rb // it's method parameterQuery origQuery = rb.getQuery(); Thanks for the link, i'll keep you informed. Regards,Andy Le Mardi 7 avril 2015 20h26, Allison, Timothy B. talli...@mitre.org a écrit : What class is origQuery? You will have to do more rewriting/calculation if you're trying to convert a PhraseQuery to a SpanNearQuery. If you dig around in org.apache.lucene.search.highlight.WeightedSpanTermExtractor in the Lucene highlighter package, you might get some inspiration. I have a hack for converting regular queries to SpanQueries here (this is largely based on WeightedSpanTermExtractor): https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/main/java/org/apache/lucene/search/spans/SimpleSpanQueryConverter.java -Original Message- From: Compte Poubelle [mailto:andymish...@yahoo.fr] Sent: Tuesday, April 07, 2015 1:53 PM To: solr-user@lucene.apache.org Subject: Re: Trouble GetSpans lucene 4 Up. Anyone? Best regards. On 6 avr. 2015, at 21:32, Test Test andymish...@yahoo.fr wrote: Hi, I'm working on TamingText's book.I try to upgrade the code from solr 3.6 to solr 4.10.2.At the moment, i have a problem about the method getSpans.spans.next() returns always false.Anyone can helps? SpanNearQuery sQuery = (SpanNearQuery) origQuery;SolrIndexSearcher searcher = rb.req.getSearcher();IndexReader reader = searcher.getIndexReader();//AtomicReader wrapper = SlowCompositeReaderWrapper.wrap(reader);MapTerm, TermContext termContexts = new HashMapTerm, TermContext();//Spans spans = sQuery.getSpans(wrapper.getContext(), new Bits.MatchAllBits(reader.numDocs()), termContexts);while (spans.next() == true) {//} Thanks.Regards.
How to trace error records during POST?
Good morning, I used Solr 4.7 to post 186,745 XML files and 186,622 files have been indexed. That means there are 123 XML files with errors. How can I trace what these files are? Thank you in advance, Simon Cheng.
Re: Deploying multiple ZooKeeper ensemble on a single machine
I have to choose unique client port #’s for each. Here I can see that you have same client port for all 3 servers. You can refer this http://myjeeva.com/zookeeper-cluster-setup.html link. -- View this message in context: http://lucene.472066.n3.nabble.com/Deploying-multiple-ZooKeeper-ensemble-on-a-single-machine-tp4198272p4198279.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deploying multiple ZooKeeper ensemble on a single machine
On 4/7/2015 9:16 PM, Zheng Lin Edwin Yeo wrote: I'm using SolrCloud 5.0.0 and ZooKeeper 3.4.6 running on Windows, and now I'm trying to deploy a multiple ZooKeeper ensemble (3 servers) on a single machine. These are the settings which I have configured, according to the Solr Reference Guide. These files are under ZOOKEEPER_HOME\conf\ directory (C:\Users\edwin\zookeeper-3.4.6\conf) *zoo.cfg* tickTime=2000 initLimit=10 syncLimit=5 dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\1 clientPort=2181 server.1=localhost:2888:3888 server.2=localhost:2889:3889 server.3=localhost:2890:3890 snip [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address localhost/127.0.0.1:3889 java.net.ConnectException: Connection refused: connect The first thing I would suspect when running any network program on a Windows machine that won't communicate is the Windows firewall, unless you have either turned off the firewall or you have explicitly configured an exception in the firewall for the relevant ports. Your other reply that you got from nutchsolruser does point out that all three zookeeper configs are using 2181 as the clientPort. Because these are all running on the same machine, you must use a different port for each one. I'm not sure what happens to subsequent processes after the first one starts, but they won't work even if they do manage to start. Thanks, Shawn
Deploying multiple ZooKeeper ensemble on a single machine
Hi, I'm using SolrCloud 5.0.0 and ZooKeeper 3.4.6 running on Windows, and now I'm trying to deploy a multiple ZooKeeper ensemble (3 servers) on a single machine. These are the settings which I have configured, according to the Solr Reference Guide. These files are under ZOOKEEPER_HOME\conf\ directory (C:\Users\edwin\zookeeper-3.4.6\conf) *zoo.cfg* tickTime=2000 initLimit=10 syncLimit=5 dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\1 clientPort=2181 server.1=localhost:2888:3888 server.2=localhost:2889:3889 server.3=localhost:2890:3890 *zoo2.cfg* tickTime=2000 initLimit=10 syncLimit=5 dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\2 clientPort=2181 server.1=localhost:2888:3888 server.2=localhost:2889:3889 server.3=localhost:2890:3890 *zoo3.cfg* tickTime=2000 initLimit=10 syncLimit=5 dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\3 clientPort=2181 server.1=localhost:2888:3888 server.2=localhost:2889:3889 server.3=localhost:2890:3890 I have also created the myid file at the respective dataDir location for each of the 3 servers. - At C:\Users\edwin\zookeeper-3.4.6\1, the myid file contains just the number 1 - At C:\Users\edwin\zookeeper-3.4.6\2, the myid file contains just the number 2 - At C:\Users\edwin\zookeeper-3.4.6\3, the myid file contains just the number 3 However, I'm getting the following error when I run zkServer.cmd 2015-04-08 10:54:17,097 [myid:1] - DEBUG [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@412] - Queue size: 1 2015-04-08 10:54:17,097 [myid:1] - DEBUG [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@412] - Queue size: 1 2015-04-08 10:54:17,097 [myid:1] - DEBUG [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@364] - Opening channel to server 2 2015-04-08 10:54:18,097 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address localhost/127.0.0.1:3889 java.net.ConnectException: Connection refused: connect at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method) at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source) at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source) at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source) at java.net.AbstractPlainSocketImpl.connect(Unknown Source) at java.net.PlainSocketImpl.connect(Unknown Source) at java.net.SocksSocketImpl.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 2015-04-08 10:54:18,099 [myid:1] - DEBUG [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@364] - Opening channel to server 3 2015-04-08 10:54:19,099 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 3 at election address localhost/127.0.0.1:3890 java.net.ConnectException: Connection refused: connect at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method) at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source) at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source) at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source) at java.net.AbstractPlainSocketImpl.connect(Unknown Source) at java.net.PlainSocketImpl.connect(Unknown Source) at java.net.SocksSocketImpl.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 2015-04-08 10:54:19,101 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 3200 Is there anything which I could have set wrongly? Regards, Edwin