solr wiki
Can I be added to the Solr wiki contributors list? Username: garysieling Thanks Gary
Re: [SolrCloud] shard hash ranges changed after restoring backup
Hi Erick, I should add that our Solr cluster is in production and new documents are constantly indexed. The new cluster has been up for three weeks now. The problem was discovered only now because in our use case Atomic Updates and RealTime Gets are mostly performed on new documents. With almost absolute certainty there are already documents in the index that were distributed to the shards according to the new hash ranges. If we just changed the hash ranges in ZooKeeper, the index would still be in an inconsistent state. Is there any way to recover from this without having to re-index all documents? Best, Gary 2016-06-15 19:23 GMT+02:00 Erick Erickson : > Simplest, though a bit risky is to manually edit the znode and > correct the znode entry. There are various tools out there, including > one that ships with Zookeeper (see the ZK documentation). > > Or you can use the zkcli scripts (the Zookeeper ones) to get the znode > down to your local machine, edit it there and then push it back up to ZK. > > I'd do all this with my Solr nodes shut down, then insure that my ZK > ensemble was consistent after the update etc > > Best, > Erick > > On Wed, Jun 15, 2016 at 8:36 AM, Gary Yao wrote: >> Hi all, >> >> My team at work maintains a SolrCloud 5.3.2 cluster with multiple >> collections configured with sharding and replication. >> >> We recently backed up our Solr indexes using the built-in backup >> functionality. After the cluster was restored from the backup, we >> noticed that atomic updates of documents are failing occasionally with >> the error message 'missing required field [...]'. The exceptions are >> thrown on a host on which the document to be updated is not stored. From >> this we are deducing that there is a problem with finding the right host >> by the hash of the uniqueKey. Indeed, our investigations so far showed >> that for at least one collection in the new cluster, the shards have >> different hash ranges assigned now. We checked the hash ranges by >> querying /admin/collections?action=CLUSTERSTATUS. Find below the shard >> hash ranges of one collection that we debugged. >> >> Old cluster: >> shard1_0 8000 - aaa9 >> shard1_1 - d554 >> shard2_0 d555 - fffe >> shard2_1 - 2aa9 >> shard3_0 2aaa - 5554 >> shard3_1 - 7fff >> >> New cluster: >> shard1 8000 - aaa9 >> shard2 - d554 >> shard3 d555 - >> shard4 0 - 2aa9 >> shard5 2aaa - 5554 >> shard6 - 7fff >> >> Note that the shard names differ because the old cluster's shards were >> split. >> >> As you can see, the ranges of shard3 and shard4 differ from the old >> cluster. This change of hash ranges matches with the symptoms we are >> currently experiencing. >> >> We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750 >> in which David Smiley comments: >> >> shard hash ranges aren't restored; this error could be disasterous >> >> It seems that this is what happened to us. We would like to hear some >> suggestions on how we could recover from this problem. >> >> Best, >> Gary
[SolrCloud] shard hash ranges changed after restoring backup
Hi all, My team at work maintains a SolrCloud 5.3.2 cluster with multiple collections configured with sharding and replication. We recently backed up our Solr indexes using the built-in backup functionality. After the cluster was restored from the backup, we noticed that atomic updates of documents are failing occasionally with the error message 'missing required field [...]'. The exceptions are thrown on a host on which the document to be updated is not stored. From this we are deducing that there is a problem with finding the right host by the hash of the uniqueKey. Indeed, our investigations so far showed that for at least one collection in the new cluster, the shards have different hash ranges assigned now. We checked the hash ranges by querying /admin/collections?action=CLUSTERSTATUS. Find below the shard hash ranges of one collection that we debugged. Old cluster: shard1_0 8000 - aaa9 shard1_1 - d554 shard2_0 d555 - fffe shard2_1 - 2aa9 shard3_0 2aaa - 5554 shard3_1 - 7fff New cluster: shard1 8000 - aaa9 shard2 - d554 shard3 d555 - shard4 0 - 2aa9 shard5 2aaa - 5554 shard6 - 7fff Note that the shard names differ because the old cluster's shards were split. As you can see, the ranges of shard3 and shard4 differ from the old cluster. This change of hash ranges matches with the symptoms we are currently experiencing. We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750 in which David Smiley comments: shard hash ranges aren't restored; this error could be disasterous It seems that this is what happened to us. We would like to hear some suggestions on how we could recover from this problem. Best, Gary
Re: Can't index all docs in a local folder with DIH in Solr 5.0.0
Alex, I've created JIRA ticket: https://issues.apache.org/jira/browse/SOLR-7174 In response to your suggestions below: 1. No exceptions are reported, even with onError removed. 2. ProcessMonitor shows only the very first epub file is being read (repeatedly) 3. I can repeat this on Ubuntu (14.04) by following the same steps. 4. Ticket raised (https://issues.apache.org/jira/browse/SOLR-7174) Additionally (and I've added this on the ticket), if I change the dataConfig to use FileDataSource and PlainTextEntityProcessor, and just list *.txt files, it works! baseDir="c:/Users/gt/Documents/HackerMonthly/epub" fileName=".*txt"> processor="PlainTextEntityProcessor" url="${files.fileAbsolutePath}" format="text" dataSource="bin"> So it's something related to BinFileDataSource and TikaEntityProcessor. Thanks, Gary. On 26/02/2015 14:24, Gary Taylor wrote: Alex, That's great. Thanks for the pointers. I'll try and get more info on this and file a JIRA issue. Kind regards, Gary. On 26/02/2015 14:16, Alexandre Rafalovitch wrote: On 26 February 2015 at 08:32, Gary Taylor wrote: Alex, Same results on recursive=true / recursive=false. I also tried importing plain text files instead of epub (still using TikeEntityProcessor though) and get exactly the same result - ie. all files fetched, but only one document indexed in Solr. To me, this would indicate that something is a problem with the inner DIH entity then. As a next set of steps, I would probably 1) remove both onError statements and see if there is an exception that is being swallowed. 2) run the import under ProcessMonitor and see if the other files are actually being read https://technet.microsoft.com/en-us/library/bb896645.aspx 3) Assume a Windows bug and test this on Mac/Linux 4) File a JIRA with a replication case. If there is a full replication setup, I'll test it machines I have access to with full debugger step-through For example, I wonder if FileBinDataSource is somehow not cleaning up after the first file properly on Windows and fails to open the second one. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ -- Gary Taylor | www.inovem.com | www.kahootz.com INOVEM Ltd is registered in England and Wales No 4228932 Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE kahootz.com is a trading name of INOVEM Ltd.
Re: Can't index all docs in a local folder with DIH in Solr 5.0.0
Alex, That's great. Thanks for the pointers. I'll try and get more info on this and file a JIRA issue. Kind regards, Gary. On 26/02/2015 14:16, Alexandre Rafalovitch wrote: On 26 February 2015 at 08:32, Gary Taylor wrote: Alex, Same results on recursive=true / recursive=false. I also tried importing plain text files instead of epub (still using TikeEntityProcessor though) and get exactly the same result - ie. all files fetched, but only one document indexed in Solr. To me, this would indicate that something is a problem with the inner DIH entity then. As a next set of steps, I would probably 1) remove both onError statements and see if there is an exception that is being swallowed. 2) run the import under ProcessMonitor and see if the other files are actually being read https://technet.microsoft.com/en-us/library/bb896645.aspx 3) Assume a Windows bug and test this on Mac/Linux 4) File a JIRA with a replication case. If there is a full replication setup, I'll test it machines I have access to with full debugger step-through For example, I wonder if FileBinDataSource is somehow not cleaning up after the first file properly on Windows and fails to open the second one. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ -- Gary Taylor | www.inovem.com | www.kahootz.com INOVEM Ltd is registered in England and Wales No 4228932 Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE kahootz.com is a trading name of INOVEM Ltd.
Re: Can't index all docs in a local folder with DIH in Solr 5.0.0
Alex, Same results on recursive=true / recursive=false. I also tried importing plain text files instead of epub (still using TikeEntityProcessor though) and get exactly the same result - ie. all files fetched, but only one document indexed in Solr. With verbose output, I get a row for each file in the directory, but only the first one has a non-empty documentImport entity. All subsequent documentImport entities just have an empty document#2 entry. eg: "verbose-output": [ "entity:files", [ null, "--- row #1-", "fileSize", 2609004, "fileLastModified", "2015-02-25T11:37:25.217Z", "fileAbsolutePath", "c:\\Users\\gt\\Documents\\epub\\issue018.epub", "fileDir", "c:\\Users\\gt\\Documents\\epub", "file", "issue018.epub", null, "-", "entity:documentImport", [ "document#1", [ "query", "c:\\Users\\gt\\Documents\\epub\\issue018.epub", "time-taken", "0:0:0.0", null, "--- row #1-", "text", "< ... parsed epub text - snip ... >" "title", "Issue 18 title", "Author", "Author text", null, "-" ], "document#2", [] ], null, "--- row #2-", "fileSize", 4428804, "fileLastModified", "2015-02-25T11:37:36.399Z", "fileAbsolutePath", "c:\\Users\\gt\\Documents\\epub\\issue019.epub", "fileDir", "c:\\Users\\gt\\Documents\\epub", "file", "issue019.epub", null, "-", "entity:documentImport", [ "document#2", [] ], null, "--- row #3-", "fileSize", 2580266, "fileLastModified", "2015-02-25T11:37:41.188Z", "fileAbsolutePath", "c:\\Users\\gt\\Documents\\epub\\issue020.epub", "fileDir", "c:\\Users\\gt\\Documents\\epub", "file", "issue020.epub", null, "-", "entity:documentImport", [ "document#2", [] ],
Re: Can't index all docs in a local folder with DIH in Solr 5.0.0
Alex, Thanks for the suggestions. It always just indexes 1 doc, regardless of the first epub file it sees. Debug / verbose don't show anything obvious to me. I can include the output here if you think it would help. I tried using the SimplePostTool first ( *java -Dtype=application/epub+zip -Durl=http://localhost:8983/solr/hn1/update/extract -jar post.jar \Users\gt\Documents\epub\*.epub) to index the docs and check the Tika parsing and that works OK so I don't think it's the e*pubs. I was trying to use DIH so that I could more easily specify the schema fields and store content in the index in preparation for trying out the search highlighting. Couldn't work out how to do that with post.jar Thanks, Gary On 25/02/2015 17:09, Alexandre Rafalovitch wrote: Try removing that first epub from the directory and rerunning. If you now index 0 documents, then there is something unexpected about them and DIH skips. If it indexes 1 document again but a different one, then it is definitely something about the repeat logic. Also, try running with debug and verbose modes and see if something specific shows up. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 25 February 2015 at 11:14, Gary Taylor wrote: I can't get the FileListEntityProcessor and TikeEntityProcessor to correctly add a Solr document for each epub file in my local directory. I've just downloaded Solr 5.0.0, on a Windows 7 PC. I ran "solr start" and then "solr create -c hn2" to create a new core. I want to index a load of epub files that I've got in a directory. So I created a data-import.xml (in solr\hn2\conf): In my solrconfig.xml, I added a requestHandler entry to reference my data-import.xml: data-import.xml I renamed managed-schema to schema.xml, and ensured the following doc fields were setup: I copied all the jars from dist and contrib\* into server\solr\lib. Stopping and restarting solr then creates a new managed-schema file and renames schema.xml to schema.xml.back All good so far. Now I go to the web admin for dataimport (http://localhost:8983/solr/#/hn2/dataimport//dataimport) and try and execute a full import. But, the results show "Requests: 0, Fetched: 58, Skipped: 0, Processed:1" - ie. it only adds one document (the very first one) even though it's iterated over 58! No errors are reported in the logs. I can search on the contents of that first epub document, so it's extracting OK in Tika, but there's a problem somewhere in my config that's causing only 1 document to be indexed in Solr. Thanks for any assistance / pointers. Regards, Gary -- Gary Taylor | www.inovem.com | www.kahootz.com INOVEM Ltd is registered in England and Wales No 4228932 Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE kahootz.com is a trading name of INOVEM Ltd. -- Gary Taylor | www.inovem.com | www.kahootz.com INOVEM Ltd is registered in England and Wales No 4228932 Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE kahootz.com is a trading name of INOVEM Ltd.
Can't index all docs in a local folder with DIH in Solr 5.0.0
I can't get the FileListEntityProcessor and TikeEntityProcessor to correctly add a Solr document for each epub file in my local directory. I've just downloaded Solr 5.0.0, on a Windows 7 PC. I ran "solr start" and then "solr create -c hn2" to create a new core. I want to index a load of epub files that I've got in a directory. So I created a data-import.xml (in solr\hn2\conf): url="${files.fileAbsolutePath}" format="text" dataSource="bin" onError="skip"> In my solrconfig.xml, I added a requestHandler entry to reference my data-import.xml: class="org.apache.solr.handler.dataimport.DataImportHandler"> data-import.xml I renamed managed-schema to schema.xml, and ensured the following doc fields were setup: required="true" multiValued="false" /> stored="true" /> stored="true" multiValued="false"/> multiValued="true"/> I copied all the jars from dist and contrib\* into server\solr\lib. Stopping and restarting solr then creates a new managed-schema file and renames schema.xml to schema.xml.back All good so far. Now I go to the web admin for dataimport (http://localhost:8983/solr/#/hn2/dataimport//dataimport) and try and execute a full import. But, the results show "Requests: 0, Fetched: 58, Skipped: 0, Processed:1" - ie. it only adds one document (the very first one) even though it's iterated over 58! No errors are reported in the logs. I can search on the contents of that first epub document, so it's extracting OK in Tika, but there's a problem somewhere in my config that's causing only 1 document to be indexed in Solr. Thanks for any assistance / pointers. Regards, Gary -- Gary Taylor | www.inovem.com | www.kahootz.com INOVEM Ltd is registered in England and Wales No 4228932 Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE kahootz.com is a trading name of INOVEM Ltd.
Re:
Can anyone remove this spammer please? On Tue, Jul 23, 2013 at 4:47 AM, wrote: > > Hi! http://mackieprice.org/cbs.com.network.html > >
Re: Is it possible to searh Solr with a longer query string?
Oh this is good! On Wed, Jun 26, 2013 at 12:05 PM, Shawn Heisey wrote: > On 6/25/2013 6:15 PM, Jack Krupansky wrote: > > Are you using Tomcat? > > > > See: > > http://wiki.apache.org/solr/SolrTomcat#Enabling_Longer_Query_Requests > > > > Enabling Longer Query Requests > > > > If you try to submit too long a GET query to Solr, then Tomcat will > > reject your HTTP request on the grounds that the HTTP header is too > > large; symptoms may include an HTTP 400 Bad Request error or (if you > > execute the query in a web browser) a blank browser window. > > > > If you need to enable longer queries, you can set the maxHttpHeaderSize > > attribute on the HTTP Connector element in your server.xml file. The > > default value is 4K. (See > > http://tomcat.apache.org/tomcat-5.5-doc/config/http.html) > > Even better would be to force SolrJ to use a POST request. In newer > versions (4.1 and later) Solr sets the servlet container's POST buffer > size and defaults it to 2MB. In older versions, you'd have to adjust > this in your servlet container config, but the default should be > considerably larger than the header buffer used for GET requests. > > I thought that SolrJ used POST by default, but after looking at the > code, it seems that I was wrong. Here's how to send a POST query: > > response = server.query(query, METHOD.POST); > > The import required for this is: > > import org.apache.solr.client.solrj.SolrRequest.METHOD; > > Gary, if you can avoid it, you should not be creating a new > HttpSolrServer object every time you make a query. It is completely > thread-safe, so create a singleton and use it for all queries against > the medline core. > > Thanks, > Shawn > >
Re: doc cache issues... query-time way to bypass cache?
Sigh, user error. I missed this in the 4.1 release notes: Collections that do not specify numShards at collection creation time use custom sharding and default to the "implicit" router. Document updates received by a shard will be indexed to that shard, unless a "*shard*" parameter or document field names a different shard. On Fri, Mar 22, 2013 at 3:39 PM, Gary Yngve wrote: > I have a situation we just discovered in solr4.2 where there are > previously cached results from a limited field list, and when querying for > the whole field list, it responds differently depending on which shard gets > the query (no extra replicas). It either returns the document on the > limited field list or the full field list. > > We're releasing tonight, so is there a query param to selectively bypass > the cache, which I can use as a temp fix? > > Thanks, > Gary >
doc cache issues... query-time way to bypass cache?
I have a situation we just discovered in solr4.2 where there are previously cached results from a limited field list, and when querying for the whole field list, it responds differently depending on which shard gets the query (no extra replicas). It either returns the document on the limited field list or the full field list. We're releasing tonight, so is there a query param to selectively bypass the cache, which I can use as a temp fix? Thanks, Gary
Re: overseer queue clogged
Thanks, Mark! The core node names in the solr.xml in solr4.2 is great! Maybe in 4.3 it can be supported via API? Also I am glad you mentioned in other post the chance to namespace zookeeper by adding a path to the end of the comma-delim zk hosts. That works out really well in our situation for having zk serve multiple amazon environments that go up and down independently of each other -- no issues w/ shared clusterstate.json or overseers. Regarding our original problem, we were able to restart all our shards but one, which wasn't getting past Mar 20, 2013 5:12:54 PM org.apache.solr.common.cloud.ZkStateReader$2 process INFO: A cluster state change has occurred - updating... Mar 20, 2013 5:12:54 PM org.apache.zookeeper.ClientCnxn$EventThread processEvent SEVERE: Error while calling watcher java.lang.NullPointerException at org.apache.solr.common.cloud.ZkStateReader$2.process(ZkStateReader.java:201) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) We ended up upgrading to solr4.2 and rebuilding the whole index from our datastore. -Gary On Sat, Mar 16, 2013 at 9:51 AM, Mark Miller wrote: > Yeah, I don't know that I've ever tried with 4.0, but I've done this with > 4.1 and 4.2. > > - Mark > > On Mar 16, 2013, at 12:19 PM, Gary Yngve wrote: > > > Cool, I'll need to try this. I could have sworn that it didn't work that > > way in 4.0, but maybe my test was bunk. > > > > -g > > > > > > On Fri, Mar 15, 2013 at 9:41 PM, Mark Miller > wrote: > >> > >> You can do this - just modify your starting Solr example to have no > cores > >> in solr.xml. You won't be able to make use of the admin UI until you > create > >> at least one core, but the core and collection apis will both work fine. > >
Re: overseer queue clogged
Cool, I'll need to try this. I could have sworn that it didn't work that way in 4.0, but maybe my test was bunk. -g On Fri, Mar 15, 2013 at 9:41 PM, Mark Miller wrote: > > You can do this - just modify your starting Solr example to have no cores > in solr.xml. You won't be able to make use of the admin UI until you create > at least one core, but the core and collection apis will both work fine.
Re: overseer queue clogged
I will upgrade to 4.2 this weekend and see what happens. We are on ec2 and have had a few issues with hostnames with both zk and solr. (but in this case i haven't rebooted any instances either) it's relatively a pain to do the upgrade because we have a query/scorer fork of lucene along with supplemental jars, and zk cannot distribute binary jars via the config. we are also multi-collection per zk... i wish it didn't require a core always defined up front for the core admin? i would love to have an instance have no cores and then just create the core i need.. -g On Fri, Mar 15, 2013 at 7:14 PM, Mark Miller wrote: > > On Mar 15, 2013, at 10:04 PM, Gary Yngve wrote: > > > i think those followers are red from trying to forward requests to the > > overseer while it was being restarted. i guess i'll see if they become > > green over time. or i guess i can restart them one at a time.. > > Restarting the cluster clear things up. It shouldn't take too long for > those nodes to recover though - they should have been up to date before. > The couple exceptions you posted def indicate something is out of whack. > It's something I'd like to get to the bottom of. > > - Mark > > > > > > > On Fri, Mar 15, 2013 at 6:53 PM, Gary Yngve > wrote: > > > >> it doesn't appear to be a shard1 vs shard11 issue... 60% of my followers > >> are red now in the solr cloud graph.. trying to figure out what that > >> means... > >> > >> > >> On Fri, Mar 15, 2013 at 6:48 PM, Gary Yngve > wrote: > >> > >>> I restarted the overseer node and another took over, queues are empty > now. > >>> > >>> the server with core production_things_shard1_2 > >>> is having these errors: > >>> > >>> shard update error RetryNode: > >>> > http://10.104.59.189:8883/solr/production_things_shard11_replica1/:org.apache.solr.client.solrj.SolrServerException > : > >>> Server refused connection at: > >>> http://10.104.59.189:8883/solr/production_things_shard11_replica1 > >>> > >>> for shard11!!! > >>> > >>> I also got some strange errors on the restarted node. Makes me wonder > if > >>> there is a string-matching bug for shard1 vs shard11? > >>> > >>> SEVERE: :org.apache.solr.common.SolrException: Error getting leader > from > >>> zk > >>> at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:771) > >>> at org.apache.solr.cloud.ZkController.register(ZkController.java:683) > >>> at org.apache.solr.cloud.ZkController.register(ZkController.java:634) > >>> at > >>> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:890) > >>> at > >>> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:874) > >>> at org.apache.solr.core.CoreContainer.register(CoreContainer.java:823) > >>> at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:633) > >>> at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) > >>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > >>> at java.util.concurrent.FutureTask.run(FutureTask.java:166) > >>> at > >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > >>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > >>> at java.util.concurrent.FutureTask.run(FutureTask.java:166) > >>> at > >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >>> at > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >>> at java.lang.Thread.run(Thread.java:722) > >>> Caused by: org.apache.solr.common.SolrException: There is conflicting > >>> information about the leader > >>> of shard: shard1 our state says: > >>> http://10.104.59.189:8883/solr/collection1/ but zookeeper says:http > >>> ://10.217.55.151:8883/solr/collection1/ > >>> at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:756) > >>> > >>> INFO: Releasing > >>> > directory:/vol/ubuntu/talemetry_match_solr/solr_server/solr/production_things_shar > >>> d11_replica1/data/index > >>> Mar 15, 2013 5:52:34 PM org.apache.solr.common.SolrException log > >>> SEVERE: org.apache.solr.common.SolrException: Error opening new > searcher > >>> at org.apache.solr.core.SolrCore.o
Re: overseer queue clogged
i think those followers are red from trying to forward requests to the overseer while it was being restarted. i guess i'll see if they become green over time. or i guess i can restart them one at a time.. On Fri, Mar 15, 2013 at 6:53 PM, Gary Yngve wrote: > it doesn't appear to be a shard1 vs shard11 issue... 60% of my followers > are red now in the solr cloud graph.. trying to figure out what that > means... > > > On Fri, Mar 15, 2013 at 6:48 PM, Gary Yngve wrote: > >> I restarted the overseer node and another took over, queues are empty now. >> >> the server with core production_things_shard1_2 >> is having these errors: >> >> shard update error RetryNode: >> http://10.104.59.189:8883/solr/production_things_shard11_replica1/:org.apache.solr.client.solrj.SolrServerException: >> Server refused connection at: >> http://10.104.59.189:8883/solr/production_things_shard11_replica1 >> >> for shard11!!! >> >> I also got some strange errors on the restarted node. Makes me wonder if >> there is a string-matching bug for shard1 vs shard11? >> >> SEVERE: :org.apache.solr.common.SolrException: Error getting leader from >> zk >> at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:771) >> at org.apache.solr.cloud.ZkController.register(ZkController.java:683) >> at org.apache.solr.cloud.ZkController.register(ZkController.java:634) >> at >> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:890) >> at >> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:874) >> at org.apache.solr.core.CoreContainer.register(CoreContainer.java:823) >> at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:633) >> at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:722) >> Caused by: org.apache.solr.common.SolrException: There is conflicting >> information about the leader >> of shard: shard1 our state says: >> http://10.104.59.189:8883/solr/collection1/ but zookeeper says:http >> ://10.217.55.151:8883/solr/collection1/ >> at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:756) >> >> INFO: Releasing >> directory:/vol/ubuntu/talemetry_match_solr/solr_server/solr/production_things_shar >> d11_replica1/data/index >> Mar 15, 2013 5:52:34 PM org.apache.solr.common.SolrException log >> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher >> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1423) >> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535) >> >> SEVERE: org.apache.solr.common.SolrException: I was asked to wait on >> state recovering for 10.76.31. >> 67:8883_solr but I still do not see the requested state. I see state: >> active live:true >> at >> org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler >> .java:948) >> >> >> >> >> On Fri, Mar 15, 2013 at 5:05 PM, Mark Miller wrote: >> >>> Strange - we hardened that loop in 4.1 - so I'm not sure what happened >>> here. >>> >>> Can you do a stack dump on the overseer and see if you see an Overseer >>> thread running perhaps? Or just post the results? >>> >>> To recover, you should be able to just restart the Overseer node and >>> have someone else take over - they should pick up processing the queue. >>> >>> Any logs you might be able to share could be useful too. >>> >>> - Mark >>> >>> On Mar 15, 2013, at 7:51 PM, Gary Yngve wrote: >>> >>> > Also, looking at overseer_elect, everything looks fine. node is valid >>> and >>> > live. >>> > >>> > >>> > On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve >>> wrote: >>> > >>> >> Sorry, should have specified. 4.1 >>> >> >>> >> >>> >> >>> >> >>> >>
Re: overseer queue clogged
it doesn't appear to be a shard1 vs shard11 issue... 60% of my followers are red now in the solr cloud graph.. trying to figure out what that means... On Fri, Mar 15, 2013 at 6:48 PM, Gary Yngve wrote: > I restarted the overseer node and another took over, queues are empty now. > > the server with core production_things_shard1_2 > is having these errors: > > shard update error RetryNode: > http://10.104.59.189:8883/solr/production_things_shard11_replica1/:org.apache.solr.client.solrj.SolrServerException: > Server refused connection at: > http://10.104.59.189:8883/solr/production_things_shard11_replica1 > > for shard11!!! > > I also got some strange errors on the restarted node. Makes me wonder if > there is a string-matching bug for shard1 vs shard11? > > SEVERE: :org.apache.solr.common.SolrException: Error getting leader from zk > at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:771) > at org.apache.solr.cloud.ZkController.register(ZkController.java:683) > at org.apache.solr.cloud.ZkController.register(ZkController.java:634) > at > org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:890) > at > org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:874) > at org.apache.solr.core.CoreContainer.register(CoreContainer.java:823) > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:633) > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:722) > Caused by: org.apache.solr.common.SolrException: There is conflicting > information about the leader > of shard: shard1 our state says: > http://10.104.59.189:8883/solr/collection1/ but zookeeper says:http > ://10.217.55.151:8883/solr/collection1/ > at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:756) > > INFO: Releasing > directory:/vol/ubuntu/talemetry_match_solr/solr_server/solr/production_things_shar > d11_replica1/data/index > Mar 15, 2013 5:52:34 PM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1423) > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535) > > SEVERE: org.apache.solr.common.SolrException: I was asked to wait on state > recovering for 10.76.31. > 67:8883_solr but I still do not see the requested state. I see state: > active live:true > at > org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler > .java:948) > > > > > On Fri, Mar 15, 2013 at 5:05 PM, Mark Miller wrote: > >> Strange - we hardened that loop in 4.1 - so I'm not sure what happened >> here. >> >> Can you do a stack dump on the overseer and see if you see an Overseer >> thread running perhaps? Or just post the results? >> >> To recover, you should be able to just restart the Overseer node and have >> someone else take over - they should pick up processing the queue. >> >> Any logs you might be able to share could be useful too. >> >> - Mark >> >> On Mar 15, 2013, at 7:51 PM, Gary Yngve wrote: >> >> > Also, looking at overseer_elect, everything looks fine. node is valid >> and >> > live. >> > >> > >> > On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve >> wrote: >> > >> >> Sorry, should have specified. 4.1 >> >> >> >> >> >> >> >> >> >> On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller > >wrote: >> >> >> >>> What Solr version? 4.0, 4.1 4.2? >> >>> >> >>> - Mark >> >>> >> >>> On Mar 15, 2013, at 7:19 PM, Gary Yngve wrote: >> >>> >> >>>> my solr cloud has been running fine for weeks, but about a week ago, >> it >> >>>> stopped dequeueing from the overseer queue, and now there are >> thousands >> >>> of >> >>>> tasks on the queue, most which look like >> >>>> >> >>>> { >> >>>> "operation":"state", >> >>>> "numShards":null, >> >>>> "shard":"shard3", >> >>>> "roles":null, >> >>>> "state":"recovering", >> >>>> "core":"production_things_shard3_2", >> >>>> "collection":"production_things", >> >>>> "node_name":"10.31.41.59:8883_solr", >> >>>> "base_url":"http://10.31.41.59:8883/solr"} >> >>>> >> >>>> i'm trying to create a new collection through collection API, and >> >>>> obviously, nothing is happening... >> >>>> >> >>>> any suggestion on how to fix this? drop the queue in zk? >> >>>> >> >>>> how could did it have gotten in this state in the first place? >> >>>> >> >>>> thanks, >> >>>> gary >> >>> >> >>> >> >> >> >> >
Re: overseer queue clogged
I restarted the overseer node and another took over, queues are empty now. the server with core production_things_shard1_2 is having these errors: shard update error RetryNode: http://10.104.59.189:8883/solr/production_things_shard11_replica1/:org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://10.104.59.189:8883/solr/production_things_shard11_replica1 for shard11!!! I also got some strange errors on the restarted node. Makes me wonder if there is a string-matching bug for shard1 vs shard11? SEVERE: :org.apache.solr.common.SolrException: Error getting leader from zk at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:771) at org.apache.solr.cloud.ZkController.register(ZkController.java:683) at org.apache.solr.cloud.ZkController.register(ZkController.java:634) at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:890) at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:874) at org.apache.solr.core.CoreContainer.register(CoreContainer.java:823) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:633) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: There is conflicting information about the leader of shard: shard1 our state says:http://10.104.59.189:8883/solr/collection1/but zookeeper says:http ://10.217.55.151:8883/solr/collection1/ at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:756) INFO: Releasing directory:/vol/ubuntu/talemetry_match_solr/solr_server/solr/production_things_shar d11_replica1/data/index Mar 15, 2013 5:52:34 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1423) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535) SEVERE: org.apache.solr.common.SolrException: I was asked to wait on state recovering for 10.76.31. 67:8883_solr but I still do not see the requested state. I see state: active live:true at org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler .java:948) On Fri, Mar 15, 2013 at 5:05 PM, Mark Miller wrote: > Strange - we hardened that loop in 4.1 - so I'm not sure what happened > here. > > Can you do a stack dump on the overseer and see if you see an Overseer > thread running perhaps? Or just post the results? > > To recover, you should be able to just restart the Overseer node and have > someone else take over - they should pick up processing the queue. > > Any logs you might be able to share could be useful too. > > - Mark > > On Mar 15, 2013, at 7:51 PM, Gary Yngve wrote: > > > Also, looking at overseer_elect, everything looks fine. node is valid > and > > live. > > > > > > On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve > wrote: > > > >> Sorry, should have specified. 4.1 > >> > >> > >> > >> > >> On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller >wrote: > >> > >>> What Solr version? 4.0, 4.1 4.2? > >>> > >>> - Mark > >>> > >>> On Mar 15, 2013, at 7:19 PM, Gary Yngve wrote: > >>> > >>>> my solr cloud has been running fine for weeks, but about a week ago, > it > >>>> stopped dequeueing from the overseer queue, and now there are > thousands > >>> of > >>>> tasks on the queue, most which look like > >>>> > >>>> { > >>>> "operation":"state", > >>>> "numShards":null, > >>>> "shard":"shard3", > >>>> "roles":null, > >>>> "state":"recovering", > >>>> "core":"production_things_shard3_2", > >>>> "collection":"production_things", > >>>> "node_name":"10.31.41.59:8883_solr", > >>>> "base_url":"http://10.31.41.59:8883/solr"} > >>>> > >>>> i'm trying to create a new collection through collection API, and > >>>> obviously, nothing is happening... > >>>> > >>>> any suggestion on how to fix this? drop the queue in zk? > >>>> > >>>> how could did it have gotten in this state in the first place? > >>>> > >>>> thanks, > >>>> gary > >>> > >>> > >> > >
Re: overseer queue clogged
Also, looking at overseer_elect, everything looks fine. node is valid and live. On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve wrote: > Sorry, should have specified. 4.1 > > > > > On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller wrote: > >> What Solr version? 4.0, 4.1 4.2? >> >> - Mark >> >> On Mar 15, 2013, at 7:19 PM, Gary Yngve wrote: >> >> > my solr cloud has been running fine for weeks, but about a week ago, it >> > stopped dequeueing from the overseer queue, and now there are thousands >> of >> > tasks on the queue, most which look like >> > >> > { >> > "operation":"state", >> > "numShards":null, >> > "shard":"shard3", >> > "roles":null, >> > "state":"recovering", >> > "core":"production_things_shard3_2", >> > "collection":"production_things", >> > "node_name":"10.31.41.59:8883_solr", >> > "base_url":"http://10.31.41.59:8883/solr"} >> > >> > i'm trying to create a new collection through collection API, and >> > obviously, nothing is happening... >> > >> > any suggestion on how to fix this? drop the queue in zk? >> > >> > how could did it have gotten in this state in the first place? >> > >> > thanks, >> > gary >> >> >
Re: overseer queue clogged
Sorry, should have specified. 4.1 On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller wrote: > What Solr version? 4.0, 4.1 4.2? > > - Mark > > On Mar 15, 2013, at 7:19 PM, Gary Yngve wrote: > > > my solr cloud has been running fine for weeks, but about a week ago, it > > stopped dequeueing from the overseer queue, and now there are thousands > of > > tasks on the queue, most which look like > > > > { > > "operation":"state", > > "numShards":null, > > "shard":"shard3", > > "roles":null, > > "state":"recovering", > > "core":"production_things_shard3_2", > > "collection":"production_things", > > "node_name":"10.31.41.59:8883_solr", > > "base_url":"http://10.31.41.59:8883/solr"} > > > > i'm trying to create a new collection through collection API, and > > obviously, nothing is happening... > > > > any suggestion on how to fix this? drop the queue in zk? > > > > how could did it have gotten in this state in the first place? > > > > thanks, > > gary > >
Re: How to use shardId
the param in solr.xml should be shard, not shardId. i tripped over this too. -g On Mon, Jan 14, 2013 at 7:01 AM, starbuck wrote: > Hi all, > > I am trying to realize a solr cloud cluster with 2 collections and 4 shards > each with 2 replicates hosted by 4 solr instances. If shardNum parm is set > to 4 and all solr instances are started after each other it seems to work > fine. > > What I wanted to do now is removing shardNum from JAVA_OPTS and defining > each core with a "shardId". Here is my current solr.xml of the first and > second (in the second there is another instanceDir, the rest is the same) > solr instance: > > > > Here is solr.xml of the third and fourth solr instance: > > > > But it seems that solr doesn't accept the shardId or omits it. What I > really > get is 2 collections each with 2 shards and 8 replicates (each solr > instance > 2) > Either the functionality is not really clear to me or there has to be a > config failure. > > It would very helpful if anyone could give me a hint. > > Thanks. > starbuck > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-use-shardId-tp4033186.html > Sent from the Solr - User mailing list archive at Nabble.com. >
solr4.1 createNodeSet requires ip addresses?
Hi all, I've been unable to get the collections create API to work with createNodeSet containing hostnames, both localhost and external hostnames. I've only been able to get it working when using explicit IP addresses. It looks like zk stores the IP addresses in the clusterstate.json and live_nodes. Is it possible that Solr Cloud is not doing any hostname resolving but just looking for an explicit match with createNodeSet? This is kind of annoying, in that I am working with EC2 instances and consider it pretty lame to need to use elastic IPs for internal use. I'm hacking around it now (looking up the eth0 inet addr on each machine), but I'm not happy about it. Has anyone else found a better solution? The reason I want to specify explicit nodes for collections is so I can have just one zk ensemble managing collections across different environments that will go up and down independently of each other. Thanks, Gary
Re: incorrect solr update behavior
Of course, as soon as I post this, I discover this: https://issues.apache.org/jira/browse/SOLR-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537900#comment-13538174 i'll give this patch a spin in the morning. (this is not an example of how to use antecedents :)) -g On Mon, Jan 14, 2013 at 6:27 PM, Gary Yngve wrote: > Posting this > > update="set">blah update="add">qux update="add">quuxfoo > > to an existing doc with foo and bar tags > results in tags_ss containing > > {add=qux} > {add=quux} > > > whereas posting this > > update="set">blah update="add">quxfoo > > results in the expected behavior: > > foo > bar > qux > > > Any ideas? > > Thanks, > Gary >
RE: dih groovy script question
Looks like some sort of foul-up with Groovy versions and Solr 3.6.1 as I had to roll back to Groovy 1.7.10 to get this to work. Started with Groovy 2 and then 1.8 before 1.7.10. What's odd is that I implemented the same calls made in ScriptTransformer.java in a test program and they worked fine with all Groovy versions. Can't imagine what the root cause might be -- Groovy implements jsr223 differently in later versions? I suppose to find out I could compile Solr with my jdk but time to march on. ;) Gary -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, September 15, 2012 9:01 AM To: solr-user@lucene.apache.org Subject: Re: dih groovy script question Stab in the dark... This looks like you're somehow getting the wrong Groovy jars. Can you print out the Groovy version as a test? Perhaps you have one groovy version in your command-line and copied a different version into the libraries Solr knows about? Because this looks like a pure Groovy error Best Erick On Thu, Sep 13, 2012 at 9:03 PM, Moore, Gary wrote: > I'm a bit stumped as to why I can't get a groovy script to run from the DIH. > I'm sure it's something braindead I'm missing. The script looks like this > in data-config.xml: > > <![CDATA[ > import java.security.MessageDigest > import java.util.HashMap > def createHashId(HashMap<String,Object>row, > org.apache.solr.handler.dataimport.ContextImpl context ) { > // do groovy stuff > return row } ]]> > > When I run the import, I get the following error: > > > Caused by: java.lang.NoSuchMethodException: No signature of method: > org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.createHashId() is > applicable for argument types: (java.util.HashMap, > org.apache.solr.handler.dataimport.ContextImpl) values: [[Format:Reports, > Credits:, EnteredBy:Corey Holland, ...], ...] > at > org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeImpl(GroovyScriptEngineImpl.java:364) > at > org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeFunction(GroovyScriptEngineImpl.java:160) > ... 13 more > > The script runs fine from the shell so I don't believe there are any groovy > errors. Thanks in advance for any tips. > Gary > > > > > This electronic message contains information generated by the USDA solely for > the intended recipients. Any unauthorized interception of this message or the > use or disclosure of the information it contains may violate the law and > subject the violator to civil or criminal penalties. If you believe you have > received this message in error, please notify the sender and delete the email > immediately.
dih groovy script question
I'm a bit stumped as to why I can't get a groovy script to run from the DIH. I'm sure it's something braindead I'm missing. The script looks like this in data-config.xml: <![CDATA[ import java.security.MessageDigest import java.util.HashMap def createHashId(HashMap<String,Object>row, org.apache.solr.handler.dataimport.ContextImpl context ) { // do groovy stuff return row } ]]> When I run the import, I get the following error: Caused by: java.lang.NoSuchMethodException: No signature of method: org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.createHashId() is applicable for argument types: (java.util.HashMap, org.apache.solr.handler.dataimport.ContextImpl) values: [[Format:Reports, Credits:, EnteredBy:Corey Holland, ...], ...] at org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeImpl(GroovyScriptEngineImpl.java:364) at org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeFunction(GroovyScriptEngineImpl.java:160) ... 13 more The script runs fine from the shell so I don't believe there are any groovy errors. Thanks in advance for any tips. Gary This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.
Payloads slowing down add/delete doc
Hi, there In order to keep a DocID vs UID map, we added payload to a solr core. The search on UID is very fast but we get a problem with adding/deleting docs. Every time we commit an adding/deleting action, solr/lucene will take up to 30 seconds to complete. Without payload, a same action can be done in milliseconds. We do need real time commit. Here is the payload definition: Any suggestions? Any help is appreciated. Best Regards G. Y.
DIH doesn't handle bound namespaces?
I'm trying to import some MODS XML using DIH. The XML uses bound namespacing: http://www.w3.org/2001/XMLSchema-instance"; xmlns:mods="http://www.loc.gov/mods/v3"; xmlns:xlink="http://www.w3.org/1999/xlink"; xmlns="http://www.loc.gov/mods/v3"; xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/mods/v3/mods-3-4.xsd"; version="3.4"> Malus domestica: Arnold However, XPathEntityProcessor doesn't seem to handle xpaths of the type xpath="//mods:titleInfo/mods:title". If I remove the namespaces from the source XML: http://www.w3.org/2001/XMLSchema-instance"; xmlns:mods="http://www.loc.gov/mods/v3"; xmlns:xlink="http://www.w3.org/1999/xlink"; xmlns="http://www.loc.gov/mods/v3"; xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/mods/v3/mods-3-4.xsd"; version="3.4"> Malus domestica: Arnold then xpath="//titleInfo/title" works just fine. Can anyone confirm that this is the case and, if so, recommend a solution? Thanks Gary Gary Moore Technical Lead LCA Digital Commons Project NAL/ARS/USDA
Re: query for point in time
Thanks for the reply. We had the search within the database initially, but it proven to be too slow. With solr we have much better performance. One more question, how could I find the most current job for each employee My data looks like John Smith department A web site bug fix 2010-01-01 2010-01-03 unit testing 2010-01-04 2010-01-06 QA support 2010-01-07 2010-01-12 implementation 2010-01-13 2010-01-22 Jane Doe department A QA support 2010-01-01 2010-05-01 implementation 2010-05-02 2010-09-28 Joe Doe department APHP development 2011-01-01 2011-08-31 Java Development 2011-09-01 2011-09-15 I would like to return this as my search result John Smith department Aimplementation 2010-01-13 2010-01-22 Jane Doe department Aimplementation 2010-05-02 2010-09-28 Joe Doedepartment AJava Development 2011-09-01 2011-09-15 Thanks in advance Gary On Thu, Sep 15, 2011 at 3:33 PM, Jonathan Rochkind wrote: > You didn't tell us what your schema looks like, what fields with what types > are involved. > > But similar to how you'd do it in your database, you need to find > 'documents' that have a start date before your date in question, and an end > date after your date in question, to find the ones whose range includes your > date in question. > > Something like this: > > q=start_date:[* TO '2010-01-05'] AND end_date:['2010-01-05' TO *] > > Of course, you need to add on your restriction to just documents about > 'John Smith', through another AND clause or an 'fq'. > > But in general, if you've got a db with this info already, and this is all > you need, why not just use the db? Multi-hieararchy data like this is going > to give you trouble in Solr eventually, you've got to arrange the solr > indexes/schema to answer your questions, and eventually you're going to have > two questions which require mutually incompatible schema to answer. > > An rdbms is a great general purpose question answering tool for structured > data. lucene/Solr is a great indexing tool for text matching. > > > On 9/15/2011 2:55 PM, gary tam wrote: > >> Hi >> >> I have a scenario that I am not sure how to write the query for. >> >> Here is the scenario - have an employee record with multi value for >> project, >> started date, end date. >> >> looks something like >> >> >> John Smith web site bug fix 2010-01-01 2010-01-03 >> unit testing 2010-01-04 >> 2010-01-06 >> QA support 2010-01-07 >> 2010-01-12 >> implementation 2010-01-13 >> 2010-01-22 >> >> I want to find what project John Smith was working on 2010-01-05 >> >> Is this possible or I have to back to my database ? >> >> >> Thanks >> >>
query for point in time
Hi I have a scenario that I am not sure how to write the query for. Here is the scenario - have an employee record with multi value for project, started date, end date. looks something like John Smith web site bug fix 2010-01-01 2010-01-03 unit testing 2010-01-04 2010-01-06 QA support 2010-01-07 2010-01-12 implementation 2010-01-13 2010-01-22 I want to find what project John Smith was working on 2010-01-05 Is this possible or I have to back to my database ? Thanks
RE: how to run solr in apache server?
Solr only runs in a container. To make it appear as if Solr is "running" on httpd, Google 'httpd tomcat' for instructions on how to front tomcat with httpd mod_jk or mod_proxy. Our system admins prefer mod_proxy. Not sure why you'd need to front Solr with httpd since it's usually an application backend, e.g. a PHP application running on port 80 connects to Solr on port 8983. Gary -Original Message- From: nagarjuna [mailto:nagarjuna.avul...@gmail.com] Sent: Wednesday, September 07, 2011 7:41 AM To: solr-user@lucene.apache.org Subject: how to run solr in apache server? Hi everybody... can anybody tell me how to run solr on Apache server(not apache tomcat) Thnax in advance -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-run-solr-in-apache-server-tp3316377p3316377.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: commas in synonyms.txt are not escaping
Hah, I knew it was something simple. :) Thanks. Gary -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Sunday, August 28, 2011 12:50 PM To: solr-user@lucene.apache.org Subject: Re: commas in synonyms.txt are not escaping Turns out this isn't a bug - I was just tripped up by the analysis changes to the example server. Gary, you are probably just hitting the same thing. The "text" fieldType is no longer used by any fields by default - for example the "text" field uses the "text_general" fieldType. This fieldType uses the standard tokenizer, which discards stuff like commas (hence the synonym will never match). -Yonik http://www.lucidimagination.com
RE: commas in synonyms.txt are not escaping
Alexi, Yes but no difference. This is apparently an issue introduced in 3.*. Thanks for your help. -Gary -Original Message- From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] Sent: Friday, August 26, 2011 10:45 AM To: solr-user@lucene.apache.org Subject: Re: commas in synonyms.txt are not escaping Gary, isn't your wordDelimiter removing your commas in the query time? have u tried it in the analyzer? 2011/8/26 Moore, Gary > Here you go -- I'm just hacking the text field at the moment. Thanks, > Gary > > > > > synonyms="index_synonyms.txt" > tokenizerFactory="solr.KeywordTokenizerFactory" ignoreCase="true" > expand="true"/> > >ignoreCase="true" >words="stopwords.txt" >enablePositionIncrements="true" >/> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > > words="stopwords.txt"/> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > -Original Message- > From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] > Sent: Friday, August 26, 2011 10:30 AM > To: solr-user@lucene.apache.org > Subject: Re: commas in synonyms.txt are not escaping > > Gary, please post the entire field declaration so I can try to reproduce > here > > > -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
RE: commas in synonyms.txt are not escaping
Thanks, Yonik. Gary -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Friday, August 26, 2011 11:25 AM To: solr-user@lucene.apache.org Subject: Re: commas in synonyms.txt are not escaping On Fri, Aug 26, 2011 at 11:16 AM, Yonik Seeley wrote: > On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary wrote: >> >> I have a number of chemical names containing commas which I'm mapping in >> index_synonyms.txt thusly: >> >> 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D >> 3,CCRIS 8562 >> >> According to the sample synonyms.txt, the comma above should be. i.e. >> a\,a=>b\,b. The problem is that according to analysis.jsp the commas are >> not being escaped. If I paste in 2,4-D-butotyl, then no mappings. If I >> paste in 2\,4-D-butotyl, the mappings are done. > > > I can confirm that this works in 1.4, but no longer works in 3x or > trunk. Can you open an issue? Actually, I think I've tracked it to LUCENE-3233 where the parsing rules were moved from Solr to Lucene (and changed the functionality in the process). I'll reopen t hat since I don't think it's been in a released version yet. -Yonik http://www.lucidimagination.com
RE: commas in synonyms.txt are not escaping
Here you go -- I'm just hacking the text field at the moment. Thanks, Gary -Original Message- From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] Sent: Friday, August 26, 2011 10:30 AM To: solr-user@lucene.apache.org Subject: Re: commas in synonyms.txt are not escaping Gary, please post the entire field declaration so I can try to reproduce here
commas in synonyms.txt are not escaping
I have a number of chemical names containing commas which I'm mapping in index_synonyms.txt thusly: 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 3,CCRIS 8562 According to the sample synonyms.txt, the comma above should be. i.e. a\,a=>b\,b.The problem is that according to analysis.jsp the commas are not being escaped. If I paste in 2,4-D-butotyl, then no mappings. If I paste in 2\,4-D-butotyl, the mappings are done. This is verified by there being no mappings in the index. I assume there would be if 2\,4-D-butotyl actually appeared in a document. The filter I'm declaring in the index analyzer looks like this: Doesn't seem to matter which tokenizer I use.This must be something simple that I'm not doing but am a bit stumped at the moment and would appreciate any tips. Thanks Gary
Re: tika integration exception and other related queries
Naveen, Not sure our requirement matches yours, but one of the things we index is a "comment" item that can have one or more files attached to it. To index the whole thing as a single Solr document we create a zipfile containing a file with the comment details in it and any additional attached files. This is submitted to Solr as a TEXT field in an XML doc, along with other meta-data fields from the comment. In our schema the TEXT field is indexed but not stored, so when we search and get a match back it doesn't contain all of the contents from the attached files etc., only the stored fields in our schema. Admittedly, the user can therefore get back a "comment" match with no indication as to WHERE the match occurred (ie. was it in the meta-data or the contents of the attached files), but at the moment we're only interested in getting appropriate matches, not explaining where the match is. Hope that helps. Kind regards, Gary. On 09/06/2011 03:00, Naveen Gupta wrote: Hi Gary It started working .. though i did not test for Zip files, but for rar files, it is working fine .. only thing what i wanted to do is to index the metadata (text mapped to content) not store the data Also in search result, i want to filter the stuffs ... and it started working fine .. i don't want to show the content stuffs to the end user, since the way it extracts the information is not very helpful to the user .. although we can apply few of the analyzers and filters to remove the unnecessary tags ..still the information would not be of much help .. looking for your opinion ... what you did in order to filter out the content or are you showing the content extracted to the end user? Even in case, we are showing the text part to the end user, how can i limit the number of characters while querying the search results ... is there any feature where we can achieve this ... the concept of snippet kind of thing ... Thanks Naveen On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylor wrote: Naveen, For indexing Zip files with Tika, take a look at the following thread : http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html I got it to work with the 3.1 source and a couple of patches. Hope this helps. Regards, Gary. On 08/06/2011 04:12, Naveen Gupta wrote: Hi Can somebody answer this ... 3. can somebody tell me an idea how to do indexing for a zip file ? 1. while sending docx, we are getting following error.
Re: tika integration exception and other related queries
Naveen, For indexing Zip files with Tika, take a look at the following thread : http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html I got it to work with the 3.1 source and a couple of patches. Hope this helps. Regards, Gary. On 08/06/2011 04:12, Naveen Gupta wrote: Hi Can somebody answer this ... 3. can somebody tell me an idea how to do indexing for a zip file ? 1. while sending docx, we are getting following error.
Re: Extracting contents of zipped files with Tika and Solr 1.4.1 (now Solr 3.1)
Jayendra, I cleared out my local repository, and replayed all of my steps from Friday and it now it works. The only difference (or the only one that's obvious to me) was that I applied the patch before doing a full compile/test/dist. But I assumed that given I was seeing my new log entries (from ExtractingDocumentLoader.java) I was running the correct code anyway. However, I'm very pleased that it's working now - I get the full contents of the zipped files indexed and not just the file names. Thank you again for your assistance, and the patch! Kind regards, Gary. On 21/05/2011 03:12, Jayendra Patil wrote: Hi Gary, I tried the patch on the the 3.1 source code (@ http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/) as well and it worked fine. @Patch - https://issues.apache.org/jira/browse/SOLR-2416, which deals with the Solr Cell module. You may want to verify the contents from the results by enabling the stored attribute on the text field. e.g. URL curl "http://localhost:8983/solr/update/extract?stream.file=C:/Test.zip&literal.id=777045&literal.title=Test&commit=true"; Let me know if it works. I would be happy to share the generated artifact you can test on. Regards, Jayendra
Re: Extracting contents of zipped files with Tika and Solr 1.4.1 (now Solr 3.1)
Hello again. Unfortunately, I'm still getting nowhere with this. I have checked-out the 3.1 source and applied Jayendra's patches (see below) and it still appears that the contents of the files in the zipfile are not being indexed, only the filenames of those contained files. I'm using a simple CURL invocation to test this: curl "http://localhost:8983/solr/core0/update/extract?literal.docid=74&fmap.content=text&literal.type=5"; -F "commit=true" -F "file=@solr1.zip" solr1.zip contains two simple txt files (doc1.txt and doc2.txt). I'm expecting the contents of those txt files to be extracted from the zip and indexed, but this isn't happening - or at least, I don't get the desired result when I do a query afterwards. I do get a match if I search for either "doc1.txt" or "doc2.txt", but not if I search for a word that appears in their contents. If I index one of the txt files (instead of the zipfile), I can query the content OK, so I'm assuming my query is sensible and matches the field specified on the CURL string (ie. "text"). I'm also happy that the Solr Cell content extraction is working because I can successfully index PDF, Word, etc. files. In a fit of desperation I have added log.info statements into the files referenced by Jayendra's patches (SOLR-2416 and SOLR-2332) and I see those in the log when I submit the zipfile with CURL, so I know I'm running those patched files in the build. If anyone can shed any light on what's happening here, I'd be very grateful. Thanks and kind regards, Gary. On 11/04/2011 11:12, Gary Taylor wrote: Jayendra, Thanks for the info - been keeping an eye on this list in case this topic cropped up again. It's currently a background task for me, so I'll try and take a look at the patches and re-test soon. Joey - glad you brought this issue up again. I haven't progressed any further with it. I've not yet moved to Solr 3.1 but it's on my to-do list, as is testing out the patches referenced by Jayendra. I'll post my findings on this thread - if you manage to test the patches before me, let me know how you get on. Thanks and kind regards, Gary. On 11/04/2011 05:02, Jayendra Patil wrote: The migration of Tika to the latest 0.8 version seems to have reintroduced the issue. I was able to get this working again with the following patches. (Solr Cell and Data Import handler) https://issues.apache.org/jira/browse/SOLR-2416 https://issues.apache.org/jira/browse/SOLR-2332 You can try these. Regards, Jayendra On Sun, Apr 10, 2011 at 10:35 PM, Joey Hanzel wrote: Hi Gary, I have been experiencing the same problem... Unable to extract content from archive file formats. I just tried again with a clean install of Solr 3.1.0 (using Tika 0.8) and continue to experience the same results. Did you have any success with this problem with Solr 1.4.1 or 3.1.0 ? I'm using this curl command to send data to Solr. curl " http://localhost:8080/solr/update/extract?literal.id=doc1&fmap.content=attr_content&commit=true"; -H "application/octet-stream" -F "myfile=@data.zip" No problem extracting single rich text documents, but archive files only result in the file names within the archive being indexed. Am I missing something else in my configuration? Solr doesn't seem to be unpacking the archive files. Based on the email chain associated with your first message, some people have been able to get this functionality to work as desired. -- Gary Taylor INOVEM Tel +44 (0)1488 648 480 Fax +44 (0)7092 115 933 gary.tay...@inovem.com www.inovem.com INOVEM Ltd is registered in England and Wales No 4228932 Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE
Seattle Solr/Lucene User Group?
Hi all, Does anyone know if there is a Solr/Lucene user group / birds-of-feather that meets in Seattle? If not, I'd like to start one up. I'd love to learn and share tricks pertaining to NRT, performance, distributed solr, etc. Also, I am planning on attending the Lucene Revolution! Let's connect! -Gary http://www.linkedin.com/in/garyyngve
Re: Extracting contents of zipped files with Tika and Solr 1.4.1
Jayendra, Thanks for the info - been keeping an eye on this list in case this topic cropped up again. It's currently a background task for me, so I'll try and take a look at the patches and re-test soon. Joey - glad you brought this issue up again. I haven't progressed any further with it. I've not yet moved to Solr 3.1 but it's on my to-do list, as is testing out the patches referenced by Jayendra. I'll post my findings on this thread - if you manage to test the patches before me, let me know how you get on. Thanks and kind regards, Gary. On 11/04/2011 05:02, Jayendra Patil wrote: The migration of Tika to the latest 0.8 version seems to have reintroduced the issue. I was able to get this working again with the following patches. (Solr Cell and Data Import handler) https://issues.apache.org/jira/browse/SOLR-2416 https://issues.apache.org/jira/browse/SOLR-2332 You can try these. Regards, Jayendra On Sun, Apr 10, 2011 at 10:35 PM, Joey Hanzel wrote: Hi Gary, I have been experiencing the same problem... Unable to extract content from archive file formats. I just tried again with a clean install of Solr 3.1.0 (using Tika 0.8) and continue to experience the same results. Did you have any success with this problem with Solr 1.4.1 or 3.1.0 ? I'm using this curl command to send data to Solr. curl " http://localhost:8080/solr/update/extract?literal.id=doc1&fmap.content=attr_content&commit=true"; -H "application/octet-stream" -F "myfile=@data.zip" No problem extracting single rich text documents, but archive files only result in the file names within the archive being indexed. Am I missing something else in my configuration? Solr doesn't seem to be unpacking the archive files. Based on the email chain associated with your first message, some people have been able to get this functionality to work as desired. -- Gary Taylor INOVEM Tel +44 (0)1488 648 480 Fax +44 (0)7092 115 933 gary.tay...@inovem.com www.inovem.com INOVEM Ltd is registered in England and Wales No 4228932 Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE
Re: adding a document using curl
As an example, I run this in the same directory as the msword1.doc file: curl "http://localhost:8983/solr/core0/update/extract?literal.docid=74&literal.type=5"; -F "file=@msword1.doc" The "type" literal is just part of my schema. Gary. On 03/03/2011 11:45, Ken Foskey wrote: On Thu, 2011-03-03 at 12:36 +0100, Markus Jelsma wrote: Here's a complete example http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_parameters_as_part_of_the_URL I should have been clearer. A rich text document, XML I can make work and a script is in the example docs folder http://wiki.apache.org/solr/ExtractingRequestHandler I also read the solr 1.4 book and tried samples in there, could not make them work. Ta
Re: Extracting contents of zipped files with Tika and Solr 1.4.1
Can anyone shed any light on this, and whether it could be a config issue? I'm now using the latest SVN trunk, which includes the Tika 0.8 jars. When I send a ZIP file (containing two txt files, doc1.txt and doc2.txt) to the ExtractingRequestHandler, I get the following log entry (formatted for ease of reading) : SolrInputDocument[ { ignored_meta=ignored_meta(1.0)={ [stream_source_info, file, stream_content_type, application/octet-stream, stream_size, 260, stream_name, solr1.zip, Content-Type, application/zip] }, ignored_=ignored_(1.0)={ [package-entry, package-entry] }, ignored_stream_source_info=ignored_stream_source_info(1.0)={file}, ignored_stream_content_type=ignored_stream_content_type(1.0)={application/octet-stream}, ignored_stream_size=ignored_stream_size(1.0)={260}, ignored_stream_name=ignored_stream_name(1.0)={solr1.zip}, ignored_content_type=ignored_content_type(1.0)={application/zip}, docid=docid(1.0)={74}, type=type(1.0)={5}, text=text(1.0)={ doc2.txtdoc1.txt} } ] So, the data coming back from Tika when parsing a ZIP file does not include the file contents, only the names of the files contained therein. I've tried forcing stream.type=application/zip in the CURL string, but that makes no difference. If I specify an invalid stream.type then I get an exception response, so I know it's being used. When I send one of those txt files individually to the ExtractingRequestHandler, I get: SolrInputDocument[ { ignored_meta=ignored_meta(1.0)={ [stream_source_info, file, stream_content_type, text/plain, stream_size, 30, Content-Encoding, ISO-8859-1, stream_name, doc1.txt] }, ignored_stream_source_info=ignored_stream_source_info(1.0)={file}, ignored_stream_content_type=ignored_stream_content_type(1.0)={text/plain}, ignored_stream_size=ignored_stream_size(1.0)={30}, ignored_content_encoding=ignored_content_encoding(1.0)={ISO-8859-1}, ignored_stream_name=ignored_stream_name(1.0)={doc1.txt}, docid=docid(1.0)={74}, type=type(1.0)={5}, text=text(1.0)={The quick brown fox } } ] and we see the file contents in the "text" field. I'm using the following requestHandler definition in solrconfig.xml: class="org.apache.solr.handler.extraction.ExtractingRequestHandler" startup="lazy"> text true ignored_ true links ignored_ Is there any further debug or diagnostic I can get out of Tika to help me work out why it's only returning the file names and not the file contents when parsing a ZIP file? Thanks and kind regards, Gary. On 25/01/2011 16:48, Jayendra Patil wrote: Hi Gary, The latest Solr Trunk was able to extract and index the contents of the zip file using the ExtractingRequestHandler. The snapshot of Trunk we worked upon had the Tika 0.8 snapshot jars and worked pretty well. Tested again with sample url and works fine - curl " http://localhost:8080/solr/core0/update/extract?stream.file=C:/temp/extract/777045.zip&literal.id=777045&literal.title=Test&commit=true " You would probably need to drill down to the Tika Jars and the apache-solr-cell-4.0-dev.jar used for Rich documents indexing. Regards, Jayendra
Re: Extracting contents of zipped files with Tika and Solr 1.4.1
OK, got past the schema.xml problem, but now I'm back to square one. I can index the contents of binary files (Word, PDF etc...), as well as text files, but it won't index the content of files inside a zip. As an example, I have two txt files - doc1.txt and doc2.txt. If I index either of them individually using: curl "http://localhost:8983/solr/core0/update/extract?literal.docid=74&fmap.content=text&literal.type=5"; -F "file=@doc1.txt" and commit, Solr will index the contents and searches will match. If I zip those two files up into solr1.zip, and index that using: curl "http://localhost:8983/solr/core0/update/extract?literal.docid=74&fmap.content=text&literal.type=5"; -F "file=@solr1.zip" and commit, the file names are indexed, but not their contents. I have checked that Tika can correctly process the zip file when used standalone with the tika-app jar - it outputs both the filenames and contents. Should I be able to index the contents of files stored in a zip by using extract ? Thanks and kind regards, Gary. On 25/01/2011 15:32, Gary Taylor wrote: Thanks Erlend. Not used SVN before, but have managed to download and build latest trunk code. Now I'm getting an error when trying to access the admin page (via Jetty) because I specify HTMLStripStandardTokenizerFactory in my schema.xml, but this appears to be no-longer supplied as part of the build so I get an exception cos it can't find that class. I've checked the CHANGES.txt and found the following in the change list to 1.4.0 (!?) : 66. SOLR-1343: Added HTMLStripCharFilter and marked HTMLStripReader, HTMLStripWhitespaceTokenizerFactory and HTMLStripStandardTokenizerFactory deprecated. To strip HTML tags, HTMLStripCharFilter can be used with an arbitrary Tokenizer. (koji) Unfortunately, I can't seem to get that to work correctly. Does anyone have an example fieldType stanza (for schema.xml) for stripping out HTML ? Thanks and kind regards, Gary. On 25/01/2011 14:17, Erlend Garåsen wrote: On 25.01.11 11.30, Erlend Garåsen wrote: Tika version 0.8 is not included in the latest release/trunk from SVN. Ouch, I wrote "not" instead of "now". Sorry, I replied in a hurry. And to clarify, by "content" I mean the main content of a Word file. Title and other kinds of metadata are successfully extracted by the old 0.4 version of Tika, but you need a newer Tika version (0.8) in order to fetch the main content as well. So try the newest Solr version from trunk. Erlend
Re: Extracting contents of zipped files with Tika and Solr 1.4.1
Thanks Erlend. Not used SVN before, but have managed to download and build latest trunk code. Now I'm getting an error when trying to access the admin page (via Jetty) because I specify HTMLStripStandardTokenizerFactory in my schema.xml, but this appears to be no-longer supplied as part of the build so I get an exception cos it can't find that class. I've checked the CHANGES.txt and found the following in the change list to 1.4.0 (!?) : 66. SOLR-1343: Added HTMLStripCharFilter and marked HTMLStripReader, HTMLStripWhitespaceTokenizerFactory and HTMLStripStandardTokenizerFactory deprecated. To strip HTML tags, HTMLStripCharFilter can be used with an arbitrary Tokenizer. (koji) Unfortunately, I can't seem to get that to work correctly. Does anyone have an example fieldType stanza (for schema.xml) for stripping out HTML ? Thanks and kind regards, Gary. On 25/01/2011 14:17, Erlend Garåsen wrote: On 25.01.11 11.30, Erlend Garåsen wrote: Tika version 0.8 is not included in the latest release/trunk from SVN. Ouch, I wrote "not" instead of "now". Sorry, I replied in a hurry. And to clarify, by "content" I mean the main content of a Word file. Title and other kinds of metadata are successfully extracted by the old 0.4 version of Tika, but you need a newer Tika version (0.8) in order to fetch the main content as well. So try the newest Solr version from trunk. Erlend
Extracting contents of zipped files with Tika and Solr 1.4.1
Hi, I posted a question in November last year about indexing content from multiple binary files into a single Solr document and Jayendra responded with a simple solution to zip them up and send that single file to Solr. I understand that the Tika 0.4 JARs supplied with Solr 1.4.1 don't currently allow this to work and only the file names of the zipped files are indexed (and not their contents). I've tried downloading and building the latest Tika (0.8) and replacing the tika-parsers and tika-core JARS in \contrib\extraction\lib but this still isn't indexing the file contents, and not doesn't even index the file names! Is there a version of Tika that works with the Solr 1.4.1 released distribution which does index the contents of the zipped files? Thanks and kind regards, Gary
Re: example schema in branch_3x returns SEVERE errors
Sorry, false alarm. Had a bad merge and had a stray library linking to an older version of another library. Works now. -Gary On Sat, Nov 27, 2010 at 4:17 PM, Gary Yngve wrote: > logs> grep SEVERE solr.err.log > SEVERE: org.apache.solr.common.SolrException: Error loading class > 'solr.KeywordMarkerFilterFactory' > SEVERE: org.apache.solr.common.SolrException: Error loading class > 'solr.KeywordMarkerFilterFactory' > SEVERE: org.apache.solr.common.SolrException: Error loading class > 'solr.KeywordMarkerFilterFactory' > SEVERE: org.apache.solr.common.SolrException: Error loading class > 'solr.EnglishMinimalStemFilterFactory' > SEVERE: org.apache.solr.common.SolrException: Error loading class > 'solr.PointType' > SEVERE: org.apache.solr.common.SolrException: Error loading class > 'solr.LatLonType' > SEVERE: org.apache.solr.common.SolrException: Error loading class > 'solr.GeoHashField' > SEVERE: java.lang.RuntimeException: schema fieldtype > text(org.apache.solr.schema.TextField) invalid > arguments:{autoGeneratePhraseQueries=true} > SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'location' > specified on field store > > It looks like it's loading the correct files... > > 010-11-27 13:01:28.005:INFO::Logging to STDERR via > org.mortbay.log.StdErrLog > 2010-11-27 13:01:28.137:INFO::jetty-6.1.22 > 2010-11-27 13:01:28.204:INFO::Extract > file:/Users/gyngve/git/gems/solr_control/solr_server/webapps/apache-solr-3.1-SNAPSHOT.war > to > /Users/gyngve/git/gems/solr_control/solr_server/work/Jetty_0_0_0_0_8983_apache.solr.3.1.SNAPSHOT.war__apache.solr.3.1.SNAPSHOT__4jaonl/webapp > > And on inspection on the war and the solr-core jar inside, I can see the > missing classes, so I am pretty confused. > > Has anyone else seen this before or have an idea on how to surmount it? > > I'm not quite ready to file a Jira issue on it yet, as I'm hoping it's user > error. > > Thanks, > Gary >
example schema in branch_3x returns SEVERE errors
logs> grep SEVERE solr.err.log SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.KeywordMarkerFilterFactory' SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.KeywordMarkerFilterFactory' SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.KeywordMarkerFilterFactory' SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.EnglishMinimalStemFilterFactory' SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.PointType' SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.LatLonType' SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.GeoHashField' SEVERE: java.lang.RuntimeException: schema fieldtype text(org.apache.solr.schema.TextField) invalid arguments:{autoGeneratePhraseQueries=true} SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'location' specified on field store It looks like it's loading the correct files... 010-11-27 13:01:28.005:INFO::Logging to STDERR via org.mortbay.log.StdErrLog 2010-11-27 13:01:28.137:INFO::jetty-6.1.22 2010-11-27 13:01:28.204:INFO::Extract file:/Users/gyngve/git/gems/solr_control/solr_server/webapps/apache-solr-3.1-SNAPSHOT.war to /Users/gyngve/git/gems/solr_control/solr_server/work/Jetty_0_0_0_0_8983_apache.solr.3.1.SNAPSHOT.war__apache.solr.3.1.SNAPSHOT__4jaonl/webapp And on inspection on the war and the solr-core jar inside, I can see the missing classes, so I am pretty confused. Has anyone else seen this before or have an idea on how to surmount it? I'm not quite ready to file a Jira issue on it yet, as I'm hoping it's user error. Thanks, Gary
Re: Extracting and indexing content from multiple binary files into a single Solr document
Jayendra, Brilliant! A very simple solution. Thank you for your help. Kind regards, Gary On 17 Nov 2010 22:09, Jayendra Patil <jayendra.patil@gmail.com> wrote: The way we implemented the same scenario is zipping all the attachments into a single zip file which can be passed to the ExtractingRequestHandler for indexing and included as a part of single Solr document. Regards, Jayendra On Wed, Nov 17, 2010 at 6:27 AM, Gary Taylor <g...@inovem.com> wrote: > Hi, > > We're trying to use Solr to replace a custom Lucene server. One > requirement we have is to be able to index the content of multiple binary > files into a single Solr document. For example, a uniquely named object in > our app can have multiple attached-files (eg. Word, PDF etc.), and we want > to index (but not store) the contents of those files in the single Solr doc > for that named object. > > At the moment, we're issuing HTTP requests direct from ColdFusion and using > the /update/extract servlet, but can only specify a single file on each > request. > > Is the best way to achieve this to extend ExtractingRequestHandler to allow > multiple binary files and thus specify our own RequestHandler, or would > using the SolrJ interface directly be a better bet, or am I missing > something fundamental? > > Thanks and regards, > Gary. >
Extracting and indexing content from multiple binary files into a single Solr document
Hi, We're trying to use Solr to replace a custom Lucene server. One requirement we have is to be able to index the content of multiple binary files into a single Solr document. For example, a uniquely named object in our app can have multiple attached-files (eg. Word, PDF etc.), and we want to index (but not store) the contents of those files in the single Solr doc for that named object. At the moment, we're issuing HTTP requests direct from ColdFusion and using the /update/extract servlet, but can only specify a single file on each request. Is the best way to achieve this to extend ExtractingRequestHandler to allow multiple binary files and thus specify our own RequestHandler, or would using the SolrJ interface directly be a better bet, or am I missing something fundamental? Thanks and regards, Gary.
Re: synonyms not working with copyfield
Hi Surajit I aint sure if this is any help, but I had a similar problem but with stop words, they were not working with dismax queries. Well to cut a long story it seems that all the querying fields need to be configured with stopwords. Maybe this has the similar affect with Synonyms confguration, thus your copyField should be defined as a type that is configured with the SynonymFilterFactory, just like "person_name". You can find some guidance here: http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/ Gary
Re: Strange NPE with SOLR-236 (Field collapsing)
Hi Eric I catch the NPE in the NonAdjacentDocumentCollapser class and now it does return the data field collapsed. However I can not promise how accurate or correct this fix is becuase I have not got allot of time to study all the code. It would be best if some of the experts could give us a clue I made the change in solr src java org apache solr search fieldcollapse NonAdjacentDocumentCollapser.java, inner class FloatValueFieldComparator.
Re: Tomcat vs Jetty: A Comparative Analysis?
http://www.webtide.com/choose/jetty.jsp >> > - Original Message - >> > From: "Steve Radhouani" >> > To: solr-user@lucene.apache.org >> > Sent: Tuesday, 16 February, 2010 12:38:04 PM >> > Subject: Tomcat vs Jetty: A Comparative Analysis? >> > >> > Hi there, >> > >> > Is there any analysis out there that may help to choose between Tomcat >> and >> > Jetty to deploy Solr? I wonder wether there's a significant difference >> > between them in terms of performance. >> > >> > Any advice would be much appreciated, >> > -Steve >> > >>
Tomcat6 env-entry
It works excellently in Tomcat 6. The toughest thing I had to deal with is discovering that the environment variable in web.xml for solr/home is essential. If you skip that step, it won't come up. solr/home java.lang.String F:\Tomcat-6.0.14\webapps\solr - Original Message - From: "Charlie Jackson" <[EMAIL PROTECTED]> To: Sent: Monday, December 03, 2007 11:35 AM Subject: RE: Tomcat6? $CALINA_HOME/conf/Catalina/localhost doesn't exist by default, but you can create it and it will work exactly the same way it did in Tomcat 5. It's not created by default because its not needed by the manager webapp anymore. -Original Message- From: Matthew Runo [mailto:[EMAIL PROTECTED] Sent: Monday, December 03, 2007 10:15 AM To: solr-user@lucene.apache.org Subject: Re: Tomcat6? In context.xml, I added.. I think that's all I did to get it working in Tocmat 6. --Matthew Runo On Dec 3, 2007, at 7:58 AM, Jörg Kiegeland wrote: In the Solr wiki, there is not described how to install Solr on Tomcat 6, and I not managed it myself :( In the chapter "Configuring Solr Home with JNDI" there is mentioned the directory $CATALINA_HOME/conf/Catalina/localhost , which not exists with TOMCAT 6. Alternatively I tried the folder $CATALINA_HOME/work/Catalina/ localhost, but with no success.. (I can query the top level page, but the "Solr Admin" link then not works). Can anybody help? -- Dipl.-Inf. Jörg Kiegeland ikv++ technologies ag Bernburger Strasse 24-25, D-10963 Berlin e-mail: [EMAIL PROTECTED], web: http://www.ikv.de phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0 = Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg board of directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO) supervising board: Prof. Dr. Bernd Mahr (chairman) _ -- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.503 / Virus Database: 269.16.12/1162 - Release Date: 11/30/2007 9:26 PM
TEI indexing
Once again, thanks for your help getting Solr up and running. I'm wondering if anyone has any hints on how to prepare TEI documents for indexing - I was about to write some XSLT but didn't want to reinvent the wheel (unless it's punctured)? Regards Gary Gary Browne Development Programmer Library IT Services University of Sydney Australia ph: 61-2-9351 5946
RE: Null pointer exception
Hi Chris The /var/www/html/solr/data/ directory did exist. I tried opening up permissions completely for testing but no luck (the tomcat user had write permissions). I decided to trash the whole installation and start again. I downloaded last nights build and untarred it. Put the .war into $TOMCAT_HOME/webapps. Copied the example/solr directory as /var/www/html/solr. No JNDI file this time, just updated solrconfig to read /var/www/html/solr as my data.dir. I can access the admin page but when I try an index action from the commandline, or a search from the admin page, I get something like: "The requested resource (/solr/select/) is not available" I have other apps running under tomcat okay, seems like it can't find the lib .jars or can't access the classes within them? Stuck... Cheers Gary Gary Browne Development Programmer Library IT Services University of Sydney Australia ph: 61-2-9351 5946 -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, 15 May 2007 9:51 AM To: solr-user@lucene.apache.org Subject: RE: Null pointer exception : I am running v1.1.0. If I do a search (from the admin page), it throws : the following exception: : : java.lang.RuntimeException: java.io.IOException: : /var/www/html/solr/data/index not a directory does /var/www/html/solr/data/ exist? ... if so does the effective userID for tomcat have permission to write to it? if not does the effective userID for tomcat have permission to write to /var/www/html/solr/ ? -Hoss
RE: Null pointer exception
Thanks a lot for your reply Chris I am running v1.1.0. If I do a search (from the admin page), it throws the following exception: java.lang.RuntimeException: java.io.IOException: /var/www/html/solr/data/index not a directory There are no exceptions on starting Tomcat, only one warning regarding JMS client lib not found (related to Cocoon). I have named a file solr.xml in my $TOMCAT_HOME/conf/Catalina/localhost directory containing the following: I am using the example configs (unmodified). Thanks again Gary Gary Browne Development Programmer Library IT Services University of Sydney Australia ph: 61-2-9351 5946 -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, 15 May 2007 7:27 AM To: solr-user@lucene.apache.org Subject: Re: Null pointer exception : I have tried indexing from the exampledocs which is just sitting in my : user home directory but now I get a null pointer exception after : running: just to clarify: are you using solr 1.1 or a nightly build? did you check the log file to ensure thatthere are no exceptions when you start tomcat? are you using the example solrconfig.xml and schema.xml? have you tried doing a search first without indexing any docs to see if that executs and (correctly) returns 0 docs? If i had to guess, i'd speculate that you aren't correctly using a system prop or JNDI to point Solr at your solr home dir, so it's not finding the configs; either that, or you've modified the configs and there is a syntax error -- either way there should be an exception when the server starts up, well before you update any docs. -Hoss
Null pointer exception
Hi All Thanks very much for your help with indexing setup. I should elucidate my directory/file setup just to check that I have everything in the right place. I have running under $TOMCAT_HOME/webapps the solr directory containing admin, WEB-INF and META-INF directories. Under my web root I have the solr directory containing the bin, conf and data directories. I have tried indexing from the exampledocs which is just sitting in my user home directory but now I get a null pointer exception after running: ./post.sh solr.xml Can anyone offer advice on this please? (I've attached the trace for reference) Thanks again Gary Gary Browne Development Programmer Library IT Services University of Sydney Australia ph: 61-2-9351 5946 May 14, 2007 1:17:34 PM org.apache.solr.core.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.core.SolrCore.update(SolrCore.java:716) at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595) May 14, 2007 1:17:34 PM org.apache.solr.core.SolrException log SEVERE: Exception during commit/optimize:java.lang.NullPointerException at org.apache.solr.core.SolrCore.update(SolrCore.java:763) at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
Still having indexing problems
Hello I have tried indexing the example files using the Jetty method, rather than Tomcat, which still didn't work. I would prefer to use my Tomcat URL. After starting jettty, I issued Java -jar post.jar http://localhost:8983/solr/update solr.xml monitor.xml as in the examples on the tutorial, but post.jar cannot be found... Where is it? Is there a path variable I need to set up somewhere? Any help greatly appreciated. Regards, Gary Gary Browne Development Programmer Library IT Services University of Sydney Australia ph: 61-2-9351 5946
New user - indexing problems
Hi I'll probably be posting a bunch of stupid questions in the near future, so bear with me. I'm finding the documentation a little confusing. For starters, I've got Solr up and running under Tomcat on port 8080, and I can pull up the admin page, no problems. I'm running on RHEL AS 4, with curl installed. I'm not sure how to get indexing started - I tried the following: ./post.sh http://localhost:8080/solr/update solr.xml monitor.xml (from exampledocs directory) and received this error message:: The specified HTTP method is not allowed for the requested resource (HTTP method GET is not supported by this URL). Any help with this would be much appreciated. Regards Gary Gary Browne Development Programmer Library IT Services University of Sydney Australia ph: 61-2-9351 5946