Re: why don't we have a forum for discussion?
Martin Lamothe schrieb: This mailing list overloads my poor BB curve. You can configure BIS/BES to not deliver mailing list email to your device. Note, that this mailing list is already as a newsgroup via NNTP today. No need to subscribe. Just get a NNTP news reader (eg. Mozilla Thunderbird). :) news://news.gmane.org/gmane.comp.jakarta.lucene.solr.user -Gunnar -- Gunnar Wagenknecht gun...@wagenknecht.org http://wagenknecht.org/
Field Boosting Code
Hi, I was looking into the Solr code and was trying to figure out as where the code for field boosting is written. I am specifically looking for classes, which gets called for that functionality. If somebody knows as where the code is, it will be of great help. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Field-Boosting-Code-tp22118997p22118997.html Sent from the Solr - User mailing list archive at Nabble.com.
Boosting Code
Hi, Can anyone please tell me where I can find the actual logic/implementation of field boosting in Solr. I am looking for classes. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Boosting-Code-tp22119017p22119017.html Sent from the Solr - User mailing list archive at Nabble.com.
Retrieve last indexed documents...
Hello everybody, I suppose this is a very common question, and I'm sorry if it has been answered before : How can I retrieve the last indexed documents (I use a timestamp field defined as field name=timestamp type=date indexed=true stored=true default=NOW multiValued=false/) ? Thanks, Pierre Landron _ Show them the way! Add maps and directions to your party invites. http://www.microsoft.com/windows/windowslive/products/events.aspx
Re: Field Boosting Code
It's in Lucene. See the Field class. Assuming you mean boosting the Field at index time and not boosting the term (text + field name) at query time. On Feb 20, 2009, at 6:26 AM, dabboo wrote: Hi, I was looking into the Solr code and was trying to figure out as where the code for field boosting is written. I am specifically looking for classes, which gets called for that functionality. If somebody knows as where the code is, it will be of great help. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Field-Boosting-Code-tp22118997p22118997.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Add jdbc entity to DataImportHandler in runtime
Hello all! I'm trying to add jdbc entities to Solr in runtime. I can update data-config.xml and reload the file using the reload-config command, but I wanted to make the first index on the new entities (not full-index), that is, add to index the data given by the query in the new entities. How can I manage to do this? Thanks in advance.
Re: Add jdbc entity to DataImportHandler in runtime
On Fri, Feb 20, 2009 at 5:44 PM, Rui Pereira ruipereira...@gmail.comwrote: Hello all! I'm trying to add jdbc entities to Solr in runtime. I can update data-config.xml and reload the file using the reload-config command, but I wanted to make the first index on the new entities (not full-index), that is, add to index the data given by the query in the new entities. How can I manage to do this? You can use 'entity=changed_entity_1entity=changed_entity_2' when calling full-import to import only the specified entities. -- Regards, Shalin Shekhar Mangar.
delta-import not giving updated records
Hi alll I am trying to run delta-import. For this I am having the below data-config.xml dataConfig dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=*** user= password=*/ document entity name=users transformer=TemplateTransformer pk=USER_ID query=select USERS.USER_ID, USERS.USER_NAME, USERS.CREATED_TIMESTAMP FROM USERS, CUSTOMERS where USERS.USER_ID = CUSTOMERS.USER_ID deltaquery=select USERS.USER_ID, USERS.USER_NAME, USERS.CREATED_TIMESTAMP FROM USERS, CUSTOMERS where USERS.USER_ID = CUSTOMERS.USER_ID field column=rowtype template=users / /entity /document /dataConfig But nothing is happening when i call http://localhost:8080/solr/users/dataimport?command=delta-import. Whereas the dataimport.properties is getting updated with the time at which delta-import is run. Where as http://localhost:8080/solr/users/dataimport?command=full-import is properly inserting data. Can anybody suggest what is wrong with this configuration. Thanks con -- View this message in context: http://www.nabble.com/delta-import-not-giving-updated-records-tp22120184p22120184.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: delta-import not giving updated records
there is a very good chance that the query created by DIH is wrong. try giving the 'deltaImportQuery' explicitly in the entity . On Fri, Feb 20, 2009 at 6:48 PM, con convo...@gmail.com wrote: Hi alll I am trying to run delta-import. For this I am having the below data-config.xml dataConfig dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=*** user= password=*/ document entity name=users transformer=TemplateTransformer pk=USER_ID query=select USERS.USER_ID, USERS.USER_NAME, USERS.CREATED_TIMESTAMP FROM USERS, CUSTOMERS where USERS.USER_ID = CUSTOMERS.USER_ID deltaquery=select USERS.USER_ID, USERS.USER_NAME, USERS.CREATED_TIMESTAMP FROM USERS, CUSTOMERS where USERS.USER_ID = CUSTOMERS.USER_ID field column=rowtype template=users / /entity /document /dataConfig But nothing is happening when i call http://localhost:8080/solr/users/dataimport?command=delta-import. Whereas the dataimport.properties is getting updated with the time at which delta-import is run. Where as http://localhost:8080/solr/users/dataimport?command=full-import is properly inserting data. Can anybody suggest what is wrong with this configuration. Thanks con -- View this message in context: http://www.nabble.com/delta-import-not-giving-updated-records-tp22120184p22120184.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: delta-import not giving updated records
1. There is no closing quote in transformer=TemplateTransformer 2. Attribute names are case-sensitive so it should be deltaQuery instead of deltaquery On Fri, Feb 20, 2009 at 6:48 PM, con convo...@gmail.com wrote: Hi alll I am trying to run delta-import. For this I am having the below data-config.xml dataConfig dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=*** user= password=*/ document entity name=users transformer=TemplateTransformer pk=USER_ID query=select USERS.USER_ID, USERS.USER_NAME, USERS.CREATED_TIMESTAMP FROM USERS, CUSTOMERS where USERS.USER_ID = CUSTOMERS.USER_ID deltaquery=select USERS.USER_ID, USERS.USER_NAME, USERS.CREATED_TIMESTAMP FROM USERS, CUSTOMERS where USERS.USER_ID = CUSTOMERS.USER_ID field column=rowtype template=users / /entity /document /dataConfig But nothing is happening when i call http://localhost:8080/solr/users/dataimport?command=delta-import. Whereas the dataimport.properties is getting updated with the time at which delta-import is run. Where as http://localhost:8080/solr/users/dataimport?command=full-importis properly inserting data. Can anybody suggest what is wrong with this configuration. Thanks con -- View this message in context: http://www.nabble.com/delta-import-not-giving-updated-records-tp22120184p22120184.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Retrieve last indexed documents...
Pierre, This is the issue to watch: https://issues.apache.org/jira/browse/SOLR-1023 I don't think there is a super nice way to do that currently. You could use the match-all query (*:*) and sort by timestamp desc, and use start=0rows=1. Using a raw timestamp that includes milliseconds is not recommended unless you really need milliseconds. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Pierre-Yves LANDRON pland...@hotmail.com To: solr-user@lucene.apache.org Sent: Friday, February 20, 2009 8:04:28 PM Subject: Retrieve last indexed documents... Hello everybody, I suppose this is a very common question, and I'm sorry if it has been answered before : How can I retrieve the last indexed documents (I use a timestamp field defined as default=NOW multiValued=false/) ? Thanks, Pierre Landron _ Show them the way! Add maps and directions to your party invites. http://www.microsoft.com/windows/windowslive/products/events.aspx
Re: Add jdbc entity to DataImportHandler in runtime
Only one more question: doesn't full-import deletes all records before execution, or in this case only deletes the entities passed in the url? Thanks in advance, Rui Pereira On Fri, Feb 20, 2009 at 1:07 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, Feb 20, 2009 at 5:44 PM, Rui Pereira ruipereira...@gmail.com wrote: Hello all! I'm trying to add jdbc entities to Solr in runtime. I can update data-config.xml and reload the file using the reload-config command, but I wanted to make the first index on the new entities (not full-index), that is, add to index the data given by the query in the new entities. How can I manage to do this? You can use 'entity=changed_entity_1entity=changed_entity_2' when calling full-import to import only the specified entities. -- Regards, Shalin Shekhar Mangar.
RE: Retrieve last indexed documents...
OK, thanks, That's what i've done ; I've kind of hoped that there was a nicer way to go, but after all, it works that way anyway... Cheers, P Landron Date: Fri, 20 Feb 2009 06:05:24 -0800 From: otis_gospodne...@yahoo.com Subject: Re: Retrieve last indexed documents... To: solr-user@lucene.apache.org Pierre, This is the issue to watch: https://issues.apache.org/jira/browse/SOLR-1023 I don't think there is a super nice way to do that currently. You could use the match-all query (*:*) and sort by timestamp desc, and use start=0rows=1. Using a raw timestamp that includes milliseconds is not recommended unless you really need milliseconds. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Pierre-Yves LANDRON pland...@hotmail.com To: solr-user@lucene.apache.org Sent: Friday, February 20, 2009 8:04:28 PM Subject: Retrieve last indexed documents... Hello everybody, I suppose this is a very common question, and I'm sorry if it has been answered before : How can I retrieve the last indexed documents (I use a timestamp field defined as default=NOW multiValued=false/) ? Thanks, Pierre Landron _ Show them the way! Add maps and directions to your party invites. http://www.microsoft.com/windows/windowslive/products/events.aspx _ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=createwx_url=/friends.aspxmkt=en-us
concurrency problem with delta-import (indexing various cores simultaniously)
Hey there, I am indexing 3 cores concurrently from 3 diferent mysql tables (I do it every 5 minutes with a cron job). The three cores use JdbcDataSource as datasource in data-config.xml Reached a point, the core that fetches more mysql rows starts running so so solw until the thread seems to stop (but the other tow keep working fine)...but java doesn't throw and exception... I am using a nightly from early january. I found someone experienced the same problem and uploaded a templateString patch to make it thread-save. http://www.nabble.com/Concurrency-problem-with-delta-import-td21665540.html#a21665540 The thing is even with this, the problem doesn't disapear. Does someone knows what is happening?? Thank you. -- View this message in context: http://www.nabble.com/concurrency-problem-with-delta-import-%28indexing-various-cores-simultaniously%29-tp22120430p22120430.html Sent from the Solr - User mailing list archive at Nabble.com.
Defining shards in solrconfig with multiple cores
Hey All, I am trying to load balance two solr installations, solr1 and solr2. Each box is running 4 cores, core0 - core3. I would like to define the shards for each box in solrconfig as such: lst name=defaults str name=shardssolr1:8080/solr/core0,solr1:8080/solr/core1,solr1:8080/solr/core2,solr1:8080/solr/core3/str /lst For whatever reason the /admin works. However when i try to /select using this shards param in the solrconfig.xml the query just hangs. Ive looked everywhere trying to figure this one out and the syntax looks right. The query works as it is supposed to when the shards param is removed from solrconfig.xml and appended to the url. However, I cant use the load balancer if i have to specify the shards host in the url. Am I doing something wrong or is this not supported yet? Is there a workaround that I can use? Thanks! Justin -- View this message in context: http://www.nabble.com/Defining-shards-in-solrconfig-with-multiple-cores-tp22120446p22120446.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: concurrency problem with delta-import (indexing various cores simultaniously)
On Fri, Feb 20, 2009 at 8:41 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Hey there, I am indexing 3 cores concurrently from 3 diferent mysql tables (I do it every 5 minutes with a cron job). The three cores use JdbcDataSource as datasource in data-config.xml Reached a point, the core that fetches more mysql rows starts running so so solw until the thread seems to stop (but the other tow keep working fine)...but java doesn't throw and exception... I am using a nightly from early january. I found someone experienced the same problem and uploaded a templateString patch to make it thread-save. Marc, I'd strongly recommend using a more recent nightly build. There was another problem related to unsafe usage of SimpleDateFormat which was fixed recently. See https://issues.apache.org/jira/browse/SOLR-1017 (which was fixed on 11th Feb) -- Regards, Shalin Shekhar Mangar.
Re: Add jdbc entity to DataImportHandler in runtime
On Fri, Feb 20, 2009 at 8:01 PM, Rui Pereira ruipereira...@gmail.comwrote: Only one more question: doesn't full-import deletes all records before execution, or in this case only deletes the entities passed in the url? If no 'entity' parameter is specified, a full-import deletes all existing documents. But if a 'entity' is specified then the deleteQuery is not executed. There's no way for DataImportHandler to figure out which documents were generated by which entity. You can use the 'preImportDeleteQuery' attribute on an entity to specify a delete query which can delete the documents created by that entity. http://wiki.apache.org/solr/DataImportHandler#head-70d3fdda52de9ee4fdb54e1c6f84199f0e1caa76 -- Regards, Shalin Shekhar Mangar.
Re: concurrency problem with delta-import (indexing various cores simultaniously)
Hey, Yeah, I patched the bug reported by Ryuuichi of the SimpleDateFormat aswell. Is there any other known concurrency bug that maybe I am missing? In my use case I could manage to index not concurrently but would like to discover why this is happening... Thank you very much! Shalin Shekhar Mangar wrote: On Fri, Feb 20, 2009 at 8:41 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Hey there, I am indexing 3 cores concurrently from 3 diferent mysql tables (I do it every 5 minutes with a cron job). The three cores use JdbcDataSource as datasource in data-config.xml Reached a point, the core that fetches more mysql rows starts running so so solw until the thread seems to stop (but the other tow keep working fine)...but java doesn't throw and exception... I am using a nightly from early january. I found someone experienced the same problem and uploaded a templateString patch to make it thread-save. Marc, I'd strongly recommend using a more recent nightly build. There was another problem related to unsafe usage of SimpleDateFormat which was fixed recently. See https://issues.apache.org/jira/browse/SOLR-1017 (which was fixed on 11th Feb) -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/concurrency-problem-with-delta-import-%28indexing-various-cores-simultaniously%29-tp22120430p22123287.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: concurrency problem with delta-import (indexing various cores simultaniously)
On Fri, Feb 20, 2009 at 10:43 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Hey, Yeah, I patched the bug reported by Ryuuichi of the SimpleDateFormat aswell. Is there any other known concurrency bug that maybe I am missing? In my use case I could manage to index not concurrently but would like to discover why this is happening... Thank you very much! I don't see any obvious issue except for these two fixes. Are you experiencing this problem even after applying both of Ryuuichi's fixes? -- Regards, Shalin Shekhar Mangar.
Question about etag
Hi guys, I'm having trouble understanding the behavior of firefox and the etag. After cleaning the cache, I send this request from firefox: GET /solr/select/?q=television HTTP/1.1 Host: localhost:8088 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 (.NET CLR 3.5.30729) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: JSESSIONID=AA71D602A701BB6287C60083DD6879CD Which solr responds with: HTTP/1.1 200 OK Last-Modified: Thu, 19 Feb 2009 19:57:14 GMT ETag: NmViOTJkMjc1ODgwMDAwMFNvbHI= Content-Type: text/xml; charset=utf-8 Transfer-Encoding: chunked Server: Jetty(6.1.3) (#data following#) So far so good. But then, I press F5 to refresh the page. Now if I understand correctly the way the etag works, firefox should send the request with a if-none-match along with the etag and then the server should return a 304 not modified code. But what happens is that firefox just don't send anything. In the firebug window, I only see 0 requests. Just to make sure I test with tcpmon and nothing is sent by firefox. Is this making sense? Am I missing something? My solrconfig.xml has this config: Thanks! _ The new Windows Live Messenger. You don’t want to miss this. http://www.microsoft.com/windows/windowslive/products/messenger.aspx
Re: concurrency problem with delta-import (indexing various cores simultaniously)
Yes, Now it's almost tree days non-stop since I am running updates with the 3 cores with cron jobs. If there are updates of 1 docs everything is alrite. When I start doing updates of 30 is when that core runs really slow. I have to abort the import in that core and keep updating with less rows each time. Another thing to point is that tomcat reaches the maximum memory I allow (2Gig) and never goes down (but at least it doesn't run out of memory). Is that normal? Shouldn't the memory go down a lot after an update is completed? Thank you very much! Shalin Shekhar Mangar wrote: On Fri, Feb 20, 2009 at 10:43 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Hey, Yeah, I patched the bug reported by Ryuuichi of the SimpleDateFormat aswell. Is there any other known concurrency bug that maybe I am missing? In my use case I could manage to index not concurrently but would like to discover why this is happening... Thank you very much! I don't see any obvious issue except for these two fixes. Are you experiencing this problem even after applying both of Ryuuichi's fixes? -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/concurrency-problem-with-delta-import-%28indexing-various-cores-simultaniously%29-tp22120430p22125443.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: concurrency problem with delta-import (indexing various cores simultaniously)
On Fri, Feb 20, 2009 at 11:23 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Yes, Now it's almost tree days non-stop since I am running updates with the 3 cores with cron jobs. If there are updates of 1 docs everything is alrite. When I start doing updates of 30 is when that core runs really slow. I have to abort the import in that core and keep updating with less rows each time. Another thing to point is that tomcat reaches the maximum memory I allow (2Gig) and never goes down (but at least it doesn't run out of memory). Is that normal? Shouldn't the memory go down a lot after an update is completed? I guess you are being hit by garbage collection. Memory utilization should go down once an import completes. Which GC are you using? There have been a few recent threads on GC settings. Perhaps you can try out a few of those settings. I don't know how big your documents/index are but if possible give it more memory. -- Regards, Shalin Shekhar Mangar.
Re: concurrency problem with delta-import (indexing various cores simultaniously)
I am working with 3 index of 1 gig each. I am using the standard setting of the GC, haven't changed anything and using java version 1.6.0_07. I don't know so much about GV configuration... just read this http://marcus.net/blog/2007/11/10/solr-search-and-java-gc-tuning/ when a month ago I exeprienced another problem with Solr (at the end it was not GV's fault). So, any advice about wich GC should I try or what should I tune? Thank you very much! Shalin Shekhar Mangar wrote: On Fri, Feb 20, 2009 at 11:23 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Yes, Now it's almost tree days non-stop since I am running updates with the 3 cores with cron jobs. If there are updates of 1 docs everything is alrite. When I start doing updates of 30 is when that core runs really slow. I have to abort the import in that core and keep updating with less rows each time. Another thing to point is that tomcat reaches the maximum memory I allow (2Gig) and never goes down (but at least it doesn't run out of memory). Is that normal? Shouldn't the memory go down a lot after an update is completed? I guess you are being hit by garbage collection. Memory utilization should go down once an import completes. Which GC are you using? There have been a few recent threads on GC settings. Perhaps you can try out a few of those settings. I don't know how big your documents/index are but if possible give it more memory. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/concurrency-problem-with-delta-import-%28indexing-various-cores-simultaniously%29-tp22120430p22125716.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Updating a single field of a document
Thanks Otis. Are these Solr specific issues. In looking through Lucene's FAQ, it seems that you would have to delete the document and re-add. Could a possible solution be to find the document by the unique-id and set the fields that were changed or would this not scale when doing a lot of document field updates? Which JIRA issues were you referring to? Thanks Amit On Thu, Feb 19, 2009 at 6:57 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Amit, This is still the case. I believe 2 separate issues related to this exist in JIRA, but none is in a finished state. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Amit Nithian anith...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, February 20, 2009 7:00:03 AM Subject: Updating a single field of a document Is there a way in Solr 1.2 (or Solr 1.3) to update a single field of an existing document if I know the primary key? Reason I ask is that I construct a document from multiple sources and some fields may need periodic updating from one of those sources. I would prefer not to have to reconstruct the entire document (and hence query the multiple sources) for a single field change. I noticed that Solr 1.2 will delete and add the new document rather than replace individual fields. Is there a way around this? Thanks Amit
Re: Updating a single field of a document
On Sat, Feb 21, 2009 at 1:00 AM, Amit Nithian anith...@gmail.com wrote: Thanks Otis. Are these Solr specific issues. In looking through Lucene's FAQ, it seems that you would have to delete the document and re-add. Could a possible solution be to find the document by the unique-id and set the fields that were changed or would this not scale when doing a lot of document field updates? Which JIRA issues were you referring to? https://issues.apache.org/jira/browse/SOLR-139 https://issues.apache.org/jira/browse/SOLR-828 -- Regards, Shalin Shekhar Mangar.
Re: Question about etag
Sorry, the xml of the solrconfig.xml was lost. It is httpCaching lastModifiedFrom=openTime etagSeed=Solr /httpCaching Hi guys, I'm having trouble understanding the behavior of firefox and the etag. After cleaning the cache, I send this request from firefox: GET /solr/select/?q=television HTTP/1.1 Host: localhost:8088 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 (.NET CLR 3.5.30729) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: JSESSIONID=AA71D602A701BB6287C60083DD6879CD Which solr responds with: HTTP/1.1 200 OK Last-Modified: Thu, 19 Feb 2009 19:57:14 GMT ETag: NmViOTJkMjc1ODgwMDAwMFNvbHI= Content-Type: text/xml; charset=utf-8 Transfer-Encoding: chunked Server: Jetty(6.1.3) (#data following#) So far so good. But then, I press F5 to refresh the page. Now if I understand correctly the way the etag works, firefox should send the request with a if-none-match along with the etag and then the server should return a 304 not modified code. But what happens is that firefox just don't send anything. In the firebug window, I only see 0 requests. Just to make sure I test with tcpmon and nothing is sent by firefox. Is this making sense? Am I missing something? My solrconfig.xml has this config: Thanks! -- View this message in context: http://www.nabble.com/Question-about-etag-tp22125449p22127322.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Defining shards in solrconfig with multiple cores
On Fri, Feb 20, 2009 at 10:32 AM, jdleider nab...@justinleider.com wrote: However when i try to /select using this shards param in the solrconfig.xml the query just hangs. The basic /select url should normally not have shards set as a default... this will cause infinite recursion when the top level searcher sends requests to the sub-searchers until you exhaust all threads and run into a distributed deadlock. Set up another handler with the default shards param instead. -Yonik Lucene/Solr? http://www.lucidimagination.com
mapping pdf metadata
Hi, I'm having trouble figuring out how to map the tika metadata fields to my own solr schema document fields. I guess the first hurdle I need to overcome, is where can I find a list of the Tika PDF metadata fields that are available for mapping? Thanks, Josh
show first couple sentences from found doc
Hi, I would like to do something similar to Google, in that for my list of hits, I would like to grab the surrounding text around my query term so I can include that in my search results. What's the easiest way to do this? Thanks, Josh
Re: show first couple sentences from found doc
Josh Joy wrote: Hi, I would like to do something similar to Google, in that for my list of hits, I would like to grab the surrounding text around my query term so I can include that in my search results. What's the easiest way to do this? Thanks, Josh Highlighter? http://wiki.apache.org/solr/HighlightingParameters Koji
Re: mapping pdf metadata
Josh, You didn't mention whether you are using http://wiki.apache.org/solr/ExtractingRequestHandler , but if you are not, maybe this already has what you need: http://wiki.apache.org/solr/ExtractingRequestHandler#head-c413be32c951c89c0a28f4f8336aa7d2774ec2d6 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Josh Joy joshjd...@gmail.com To: solr-user@lucene.apache.org Sent: Saturday, February 21, 2009 9:11:01 AM Subject: mapping pdf metadata Hi, I'm having trouble figuring out how to map the tika metadata fields to my own solr schema document fields. I guess the first hurdle I need to overcome, is where can I find a list of the Tika PDF metadata fields that are available for mapping? Thanks, Josh
Re: mapping pdf metadata
And when you do use the ExtractingRequestHandler (aka Solr Cell), you can find the metadata fields by using the ext.extract.only=true setting. You might also find this article by Sami Siren helpful: http://www.lucidimagination.com/index.php?option=com_contenttask=viewid=106 Erik On Feb 20, 2009, at 8:39 PM, Otis Gospodnetic wrote: Josh, You didn't mention whether you are using http://wiki.apache.org/solr/ExtractingRequestHandler , but if you are not, maybe this already has what you need: http://wiki.apache.org/solr/ExtractingRequestHandler#head-c413be32c951c89c0a28f4f8336aa7d2774ec2d6 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Josh Joy joshjd...@gmail.com To: solr-user@lucene.apache.org Sent: Saturday, February 21, 2009 9:11:01 AM Subject: mapping pdf metadata Hi, I'm having trouble figuring out how to map the tika metadata fields to my own solr schema document fields. I guess the first hurdle I need to overcome, is where can I find a list of the Tika PDF metadata fields that are available for mapping? Thanks, Josh
Suggested hardening of Solr schema.jsp admin interface
My colleague Paul opened this issue and supplied a patch and I commented on it regarding a potential security weakness in the admin interface: https://issues.apache.org/jira/browse/SOLR-1031 -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
What is the performance impact of a fq that matches all docs?
We are working on integration with the Drupal CMS, and so are writing code that carries out operations that might only be relevant for only a small subset of the sites/indexes that might use the integration module. In this regard, I'm wondering if adding to the query (using the dismax or mlt handlers) a fq that matches all documents would have any impact on performance? I gatehr that there is caching for the fq matches, but it seems liek that would still incur some overhead, especially for a large index? As a more concrete example, suppose each document has a string field that names the role of user that is allowed to see the content. e.g. 'public', 'registered', 'admin'. Most sites have only public content, but because our code is generic, we might add fq=role:public to every query. What would the expected performance effect be compared to omitting that fq if, for example, we had a way to determine in advance that all site content matches 'public'. Thanks, Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com