Re: Not able to search Spanish word with ascent in solr
Hi, I am having a same kind of issue. I am not able to search accented characters of spanish. For eg: - Según, próximos etc. I have field called attr_content which holds the content of a PDF file whose contents are in spanish. I am using Apache Tika to index the contents of a PDF file. I have wrote a java class which using the Apache Tika classes to read the PDF contents and index it to solr 3.5. Anything which can be missed? Is it be because of encoding issues. Please help. Deep
Automatic cross linking
Hello, I'm looking to use Solr for creating cross linking in text. For exemple : I'll like to be able to request for a text field, an article, in my blog. And that Solr use a script/method, request to parse the text, find all matching categories term and caps the results. Do you have any suggestion, documentation, tutorial, source code :), that could help me to realise this optimisation. Regards. David
RE: Support for Mongolian language
I have already checked this link. Could not find any hint about Mongolian language. Is there any plugin available for that? -Original Message- From: bbarani [mailto:bbar...@gmail.com] Sent: Thursday, May 30, 2013 2:04 AM To: solr-user@lucene.apache.org Subject: Re: Support for Mongolian language Check out.. wiki.apache.org/solr/LanguageAnalysis For some reason the above site takes long time to open.. -- View this message in context: http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html Sent from the Solr - User mailing list archive at Nabble.com. DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Problem with xpath expression in data-config.xml
Thanks for having analyzed the problem. But please let me note that I came to a somehow different conclusion. Define for the moment title to be the primary unique key: solr-4.3.0\example\example-DIH\solr\rss\conf\schema.xml uniqueKeytitle/uniqueKey solr-4.3.0\example\example-DIH\solr\rss\conf\rss-data-config.xml [BAD CASE] (irrespective of the predicate @rel='self') dataConfig dataSource type=URLDataSource / document entity name=beautybooks88 pk=title url=http://beautybooks88.blogspot.com/feeds/posts/default; processor=XPathEntityProcessor forEach=/feed/entry transformer=DateFormatTransformer field column=title xpath=/feed/entry/title / field column=source-link xpath=/feed/link[@rel='self']/@href commonField=true / /entity /document /dataConfig [GOOD CASE] dataConfig dataSource type=URLDataSource / document entity name=beautybooks88 pk=title url=http://beautybooks88.blogspot.com/feeds/posts/default; processor=XPathEntityProcessor forEach=/feed/entry transformer=DateFormatTransformer field column=title xpath=/feed/entry/title / field column=link xpath=/feed/entry/link[@rel='self']/@href / /entity /document /dataConfig Conclusion: It has nothing to do with the number of occurrences of the pattern.
[DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor
Hello, I want to use a index a huge list of xml file. _ Using FileListEntityProcessor causes an OutOfMemoryException (too many files...) _ I can do it using a LineEntityProcessor reading a list of files, generated externally, but I would prefer to generate the list in SOLR _ So to avoid to mantain a list of files, I'm trying to generate the list with an sql query, and to give the list of results to XPathEntityProcessor, which will read the file The query select DISTINCT... generate this result CHEMINRELATIF 3/0/000/3001 But the problem is that with the following configuration, no request do db is done, accoring to the message returned by DIH. statusMessages:{ Total Requests made to DataSource:0, Total Rows Fetched:0, Total Documents Processed:0, Total Documents Skipped:0, :Indexing completed. Added/Updated: 0 documents. Deleted 0 documents., Committed:2013-05-30 10:23:30, Optimized:2013-05-30 10:23:30, And the log: INFO 2013-05-30 10:23:29,924 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (121) - Loading DIH Configuration: mnb-data-config.xml INFO 2013-05-30 10:23:29,957 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (224) - Data Configuration loaded successfully INFO 2013-05-30 10:23:29,969 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (414) - Starting Full Import INFO 2013-05-30 10:23:30,009 http-8080-1 org.apache.solr.handler.dataimport.SimplePropertiesWriter (219) - Read dataimportMNb.properties INFO 2013-05-30 10:23:30,045 http-8080-1 org.apache.solr.handler.dataimport.DocBuilder (292) - Import completed successfully Did some has already done the kind of configuration, or is just not possible? The config: dataConfig dataSource name=accesPCN type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@mymachine:myport:mydb user=myuser password=mypasswd readOnly=true/ document entity name=requeteurNomsFichiersNotices datasource=accesPCN processor=SqlEntityProcessor query=select DISTINCT... SUBSTR( to_char(noticebib.numnoticebib, '9'), 3, 1) || '/' || SUBSTR( to_char(noticebib.numnoticebib, '9'), 4, 1) || '/' || SUBSTR( to_char(noticebib.numnoticebib, '9'), 5, 3) || '/' || to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF from bnf.noticebib where numnoticebib = '3001' transformer=LogTransformer logTemplate=In entity requeteurNomsFichiersNotices logLevel=debug entity name=processorDocument processor=XPathEntityProcessor url=file:///D:/jed/noticesBib/$ {accesPCN.CHEMINRELATIF} xsl=xslt/mnb/IXM_MNb.xsl forEach=/record transformer=LogTransformer,fr.bnf.solr.BnfDateTransformer logTemplate=Notice fichier: $ {accesPCN.CHEMINRELATIF} logLevel=debug datasource=accesPCN I'm trying to inde Cordialement, --- Jérôme Dupont Bibliothèque Nationale de France Département des Systèmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Exposition Guy Debord, un art de la guerre - du 27 mars au 13 juillet 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
Re: Query syntax error: Cannot parse ....
Hi, Indeed, with character # encoded the query works fine. Thanks -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, May 29, 2013 at 9:43 PM, bbarani wrote: # has a separate meaning in URL.. You need to encode that.. http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Escaping%20Special%20Characters. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-syntax-error-Cannot-parse-tp4066560p4066879.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
Re: Sorting results by last update date
Thanks Shalini... It is solr 3.6.2 Instead of NOW, I can use today's date (I did not know this cache issue,, thanks). Later I realized , it looks it is my mistake that misleads asc and desc ordering result. After I get data from solr, again I do mysql query where the order changes again. Regards Kamal On Wed, May 29, 2013 at 2:54 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, May 29, 2013 at 12:10 PM, Kamal Palei palei.ka...@gmail.com wrote: Hi All I am trying to sort the results as per last updated date. My url looks as below. *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO 588]fq=salary:[0 TO 500] OR salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java +sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java +sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json json.nl=mapsort=last_updated_date asc * With this I get the data in ascending order of last updated date. If I am trying to sort data in descending order, I use below url *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO 588]fq=salary:[0 TO 500] OR salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java +sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java +sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json json.nl=mapsort=last_updated_date desc* Here the data set is not ordered properly, mostly it looks to me data is ordered on basis of score, not last updated date. Can somebody tell me what I am missing here, why *desc* is not working properly for me. What is the field type of last_update_date? Which version of Solr? A side note: Using NOW in a filter query is ineffecient because it doesn't use your filter cache effectively. Round it to nearest time interval instead. See http://java.dzone.com/articles/solr-date-math-now-and-filter -- Regards, Shalin Shekhar Mangar.
Re: Sorting results by last update date
sort=last_updated_date desc Maybe adding %20 will help: sort=last_updated_date%20desc -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-results-by-last-update-date-tp4066692p4066986.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor
Did you declare that field name in outer entity? Not just select as in the query. Regards, Alex On 30 May 2013 04:31, jerome.dup...@bnf.fr wrote: Hello, I want to use a index a huge list of xml file. _ Using FileListEntityProcessor causes an OutOfMemoryException (too many files...) _ I can do it using a LineEntityProcessor reading a list of files, generated externally, but I would prefer to generate the list in SOLR _ So to avoid to mantain a list of files, I'm trying to generate the list with an sql query, and to give the list of results to XPathEntityProcessor, which will read the file The query select DISTINCT... generate this result CHEMINRELATIF 3/0/000/3001 But the problem is that with the following configuration, no request do db is done, accoring to the message returned by DIH. statusMessages:{ Total Requests made to DataSource:0, Total Rows Fetched:0, Total Documents Processed:0, Total Documents Skipped:0, :Indexing completed. Added/Updated: 0 documents. Deleted 0 documents., Committed:2013-05-30 10:23:30, Optimized:2013-05-30 10:23:30, And the log: INFO 2013-05-30 10:23:29,924 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (121) - Loading DIH Configuration: mnb-data-config.xml INFO 2013-05-30 10:23:29,957 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (224) - Data Configuration loaded successfully INFO 2013-05-30 10:23:29,969 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (414) - Starting Full Import INFO 2013-05-30 10:23:30,009 http-8080-1 org.apache.solr.handler.dataimport.SimplePropertiesWriter (219) - Read dataimportMNb.properties INFO 2013-05-30 10:23:30,045 http-8080-1 org.apache.solr.handler.dataimport.DocBuilder (292) - Import completed successfully Did some has already done the kind of configuration, or is just not possible? The config: dataConfig dataSource name=accesPCN type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@mymachine:myport:mydb user=myuser password=mypasswd readOnly=true/ document entity name=requeteurNomsFichiersNotices datasource=accesPCN processor=SqlEntityProcessor query=select DISTINCT... SUBSTR( to_char(noticebib.numnoticebib, '9'), 3, 1) || '/' || SUBSTR( to_char(noticebib.numnoticebib, '9'), 4, 1) || '/' || SUBSTR( to_char(noticebib.numnoticebib, '9'), 5, 3) || '/' || to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF from bnf.noticebib where numnoticebib = '3001' transformer=LogTransformer logTemplate=In entity requeteurNomsFichiersNotices logLevel=debug entity name=processorDocument processor=XPathEntityProcessor url=file:///D:/jed/noticesBib/$ {accesPCN.CHEMINRELATIF} xsl=xslt/mnb/IXM_MNb.xsl forEach=/record transformer=LogTransformer,fr.bnf.solr.BnfDateTransformer logTemplate=Notice fichier: $ {accesPCN.CHEMINRELATIF} logLevel=debug datasource=accesPCN I'm trying to inde Cordialement, --- Jérôme Dupont Bibliothèque Nationale de France Département des Systèmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Exposition Guy Debord, un art de la guerre - du 27 mars au 13 juillet 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
Re: Automatic cross linking
Do it outside of solr or look at update request processors. E.g. UIMA integration as an example. Regards, Alex On 30 May 2013 02:52, It-forum it-fo...@meseo.fr wrote: Hello, I'm looking to use Solr for creating cross linking in text. For exemple : I'll like to be able to request for a text field, an article, in my blog. And that Solr use a script/method, request to parse the text, find all matching categories term and caps the results. Do you have any suggestion, documentation, tutorial, source code :), that could help me to realise this optimisation. Regards. David
SPLITSHARD: time out error
Hi, I have a time out error when I try to split a collection with 15M documents The exception (solr version 4.3): 542468 [catalina-exec-27] INFO org.apache.solr.servlet.SolrDispatchFilter – [admin] webapp=null path=/admin/collections params={shard=00action=SPLITSHARDcollection=ST-0112_replicated} status=500 QTime=300028 542469 [catalina-exec-27] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:org.apache.solr.common.SolrException: splitshard the collection time out:300s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:166) at org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:300) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:608) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 582557 [catalina-exec-39] INFO org.apache.solr.update.SolrIndexSplitter – SolrIndexSplitter: partition #1 582561 [catalina-exec-39] INFO org.apache.solr.core.SolrCore – SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/disk2/node00.solrcloud/solr/home/0112_replicated_00_1_replica1/data/index,segFN=segments_1,generation=1,filenames=[segments_1] 582563 [catalina-exec-39] INFO org.apache.solr.core.SolrCore – newest commit = 1[segments_1] How I can split my collection without this error? - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/SPLITSHARD-time-out-error-tp4066991.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multiple field join?
Solr Join is _not_ sql subquery and won't work like one. There's a reason it's called pseudo join in the JIRA issues. My advice. Forget joins and try to write this in pure Solr query language. The more you try to use Solr like a database, the more you'll get into trouble. De-normalize your data and try again. Best Erick On Wed, May 29, 2013 at 10:34 PM, cmd.ares cmd.a...@gmail.com wrote: http://wiki.apache.org/solr/Join I found solr join is actually sql subquery,does solr support 3 tables jion ? the sql like this SELECT xxx, yyy FROM collection1 WHERE outer_id IN (SELECT inner_id FROM collection1 where zzz = vvv) and outer_id2 IN (SELECT inner_id2 FROM collection1 where ttt = xxx) and outer_id3 IN (SELECT inner_id3 FROM collection1 where ppp = rrr) how to write the solr request url? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/multiple-field-join-tp4066930.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Not able to search Spanish word with ascent in solr
Deep: Have you looked through the rest of the thread and tried the suggestions? If so, what were the results? Best Erick On Thu, May 30, 2013 at 2:45 AM, Deep Lotia deeplo...@gmail.com wrote: Hi, I am having a same kind of issue. I am not able to search accented characters of spanish. For eg: - Según, próximos etc. I have field called attr_content which holds the content of a PDF file whose contents are in spanish. I am using Apache Tika to index the contents of a PDF file. I have wrote a java class which using the Apache Tika classes to read the PDF contents and index it to solr 3.5. Anything which can be missed? Is it be because of encoding issues. Please help. Deep
Removing a single value from a multiValue field
I have a Solr application with a multiValue field 'tags'. All fields are indexed in this application. There exists a uniqueKey field 'id' and a '_version_' field. This is running on Solr 4.x. In order to add a tag, the application retrieves the full document, creates a PHP array from the document structure, removes the '_version_' field, and then adds the appropriate tag to the 'tags' array. This is all then sent to Solr's update method via HTTP with 'overwrite=true'. Solr correctly replaces the extant document with the new document, which is identical with the exception of a new value for the '_version_' field and an additional value in the multiValued field 'tags'. This all works correctly. I am now adding a feature where one can remove tags. I am using the same business logic, however instead of adding a value to the 'tags' array I am removing one. I can confirm that the data being sent to Solr does not contain the removed tag. However, it seems that the old value for the multiValue field is persisted, that is the old tag stays. I can see that the '_version_' field has a new value, so I see that the change was properly commited. Is there a known bug that overwriting such a doc...: doc arr name=tags stra/str strb/str /arr /doc ...with this doc...: doc arr name=tags stra/str /arr /doc ...has no effect? Can multiValue fields be only added, but not removed? Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Problem with PatternReplaceCharFilter
Just count the character in the literal portions of the patterns and include that spaces in the replacement. So, TextLine would become . It gets trickier if names are variable length. But I'm sure you could come up with patterns to replace one, two, three, etc. char names with equivalent spaces. But... if all of this is too difficult for you, some people might find it easier to preprocess the data before sending it to Solr. I mean, do you really need to highlight the content in such a cryptic input format? Ultimately you might be better off with a custom char filter - sometimes people can cope better with straight Java code than cryptic regular expression sequences. -- Jack Krupansky -Original Message- From: jasimop Sent: Thursday, May 30, 2013 12:46 AM To: solr-user@lucene.apache.org Subject: Re: Problem with PatternReplaceCharFilter Honestly, I have no idea how to do that. PatternReplaceCharFilter doesn't seem to have a parameter like preservePositions=true and optionally fillCharacter= . And I don't think I can express this simply as regex. How would I count in a pure regex the length difference before and after the match? Well, the specific problem is, that when highlighting the term positions are wrong and the result is not a valid XML structure that I can handle. I expect something like TextLine aa=quot;bbquot; cc=quot;ddquot; content=quot;the content to lt;emsearch/em in ee=ff / but I can Texlt;emtLine/emaa=bb cc=dd content=the content to emsearch/em in ee=ff / Thanks for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-PatternReplaceCharFilter-tp4066869p4066939.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Support for Mongolian language
No, there is not. -- Jack Krupansky -Original Message- From: Sagar Chaturvedi Sent: Thursday, May 30, 2013 3:03 AM To: solr-user@lucene.apache.org Subject: RE: Support for Mongolian language I have already checked this link. Could not find any hint about Mongolian language. Is there any plugin available for that? -Original Message- From: bbarani [mailto:bbar...@gmail.com] Sent: Thursday, May 30, 2013 2:04 AM To: solr-user@lucene.apache.org Subject: Re: Support for Mongolian language Check out.. wiki.apache.org/solr/LanguageAnalysis For some reason the above site takes long time to open.. -- View this message in context: http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html Sent from the Solr - User mailing list archive at Nabble.com. DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
RE: Upgrade Solr index from 4.0 to 4.2.1
So having tried all combinations of LUCENE_40, 41 and 42 we're still having no success in getting our indexes to load with Solr 4.2.1... Any direction we can look into ? in our system the underlying data is very slow to re-index and would take an unreasonable amount of time at a customer site to wait for information to become available after an upgrade, so we're very hopeful there can be a way to upgrade a Lucene index properly. Thanks, Elran -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, May 22, 2013 2:25 PM To: solr-user@lucene.apache.org Subject: Re: Upgrade Solr index from 4.0 to 4.2.1 LUCENE_40 since your original index was built with 4.0. As for the other, I'll defer to people who actually know what they're talking about. Best Erick On Wed, May 22, 2013 at 5:19 AM, Elran Dvir elr...@checkpoint.com wrote: My index is originally of version 4.0. My methods failed with this configuration. So, I changed solrconfig.xml in my index to both versions: LUCENE_42 and LUCENE_41. For each version in each method (loading and IndexUpgrader), I see the same errors as before. Thanks. -Original Message- From: Elran Dvir Sent: Tuesday, May 21, 2013 6:48 PM To: solr-user@lucene.apache.org Subject: RE: Upgrade Solr index from 4.0 to 4.2.1 Why LUCENE_42?Why not LUCENE_41? Do I still need to run IndexUpgrader or just loading will be enough? Thanks. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, May 21, 2013 2:52 PM To: solr-user@lucene.apache.org Subject: Re: Upgrade Solr index from 4.0 to 4.2.1 This is always something that gives me a headache, but what happens if you change luceneMatchVersion in solrconfig.xml to LUCENE_40? I'm assuming it's LUCENE_42... Best Erick On Tue, May 21, 2013 at 5:48 AM, Elran Dvir elr...@checkpoint.com wrote: Hi all, I have a 4.0 Solr (sharded/cored) index. I upgraded Solr to 4.2.1 and tried to load the existing index with it. I got the following exception: May 21, 2013 12:03:42 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: other_2013-05-04 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345) at java.util.concurrent.FutureTask.run(FutureTask.java:177) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345) at java.util.concurrent.FutureTask.run(FutureTask.java:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1121) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang.Thread.run(Thread.java:779) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.init(SolrCore.java:822) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) ... 10 more Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1435) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1547) at org.apache.solr.core.SolrCore.init(SolrCore.java:797) ... 13 more Caused by: org.apache.solr.common.SolrException: Error opening Reader at org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172) at org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:183) at org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:179) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1411) ... 15 more Caused by: org.apache.lucene.index.CorruptIndexException: codec mismatch: actual codec=Lucene40StoredFieldsIndex vs expected codec=Lucene41StoredFieldsIndex (resource: MMapIndexInput(path=/var/solr/multicore_solr/other_2013-05-04/data/index/_3gfk.fdx)) at org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:140) at
Re: Sorting results by last update date
You can just use NOW/DAY for a filter that would only change once a day: [NOW/DAY-60DAY TO NOW/DAY] Oops... make that: [NOW/DAY-60DAY TO NOW/DAY+1DAY] Otherwise, it would miss dates after the start of today. Even better, make it: [NOW/DAY-60DAY TO *] -- Jack Krupansky -Original Message- From: Kamal Palei Sent: Thursday, May 30, 2013 5:41 AM To: solr-user@lucene.apache.org Subject: Re: Sorting results by last update date Thanks Shalini... It is solr 3.6.2 Instead of NOW, I can use today's date (I did not know this cache issue,, thanks). Later I realized , it looks it is my mistake that misleads asc and desc ordering result. After I get data from solr, again I do mysql query where the order changes again. Regards Kamal On Wed, May 29, 2013 at 2:54 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, May 29, 2013 at 12:10 PM, Kamal Palei palei.ka...@gmail.com wrote: Hi All I am trying to sort the results as per last updated date. My url looks as below. *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO 588]fq=salary:[0 TO 500] OR salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java +sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java +sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json json.nl=mapsort=last_updated_date asc * With this I get the data in ascending order of last updated date. If I am trying to sort data in descending order, I use below url *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO 588]fq=salary:[0 TO 500] OR salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java +sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java +sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json json.nl=mapsort=last_updated_date desc* Here the data set is not ordered properly, mostly it looks to me data is ordered on basis of score, not last updated date. Can somebody tell me what I am missing here, why *desc* is not working properly for me. What is the field type of last_update_date? Which version of Solr? A side note: Using NOW in a filter query is ineffecient because it doesn't use your filter cache effectively. Round it to nearest time interval instead. See http://java.dzone.com/articles/solr-date-math-now-and-filter -- Regards, Shalin Shekhar Mangar.
Re: Reindexing strategy
On Wed, May 29, 2013 at 5:37 PM, Shawn Heisey s...@elyograg.org wrote: It's impossible for us to give you hard numbers. You'll have to experiment to know how fast you can reindex without killing your servers. A basic tenet for such experimentation, and something you hopefully already know: You'll want to get baseline measurements before you begin testing for comparison. Thanks. I wan't looking for hard numbers, but rather am looking for what are the signs of problems. I know to keep my eye on memory and CPU, but I have no idea how to check disk I/O, and I'm not sure how to determine even if that becomes saturated. One of the most reliable Solr-specific indicators of pushing your hardware too hard is that the QTime on your queries will start to increase dramatically. Solr 4.1 and later has more granular query time statistics in the UI - the median and 95% numbers are much more important than the average. Thank you, this will help. At least I now have a hard metric to see when Solr is getting overburdened (QTime). Outside of that, if your overall IOwait CPU percentage starts getting near (or above) 30-50%, your server is struggling. If all of your CPU cores are staying near 100% usage, then it's REALLY struggling. I see, thanks. Assuming you have plenty of CPU cores, using fast storage and having plenty of extra RAM will alleviate much of the I/O bottleneck. The usual rule of thumb for good query performance is that you need enough RAM to put 50-100% of your index in the OS disk cache. For blazing performance during a rebuild, that becomes 100-200%. If you had 150%, that would probably keep most indexes well-cached even during a rebuild. A rebuild will always lower performance, even with lots of RAM. Considering that the Solr index is the only place that the data is stored, and that users are actively using the system, I was not planning on a rebuild but rather to iteratively reindex the extant documents, even as new documents are being push in. My earlier reply to your other message has some other ideas that will hopefully help. Thank you Shawn! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: What exactly happens to extant documents when the schema changes?
On Wed, May 29, 2013 at 5:09 PM, Shawn Heisey s...@elyograg.org wrote: I handle this in a very specific way with my sharded index. This won't work for all designs, and the precise procedure won't work for SolrCloud. There is a 'live' and a 'build' core for each of my shards. When I want to reindex, the program makes a note of my current position for deletes, reinserts, and new documents. Then I use a DIH full-import from mysql into the build cores. Once the import is done, I run the update cycle of deletes, reinserts, and new documents on those build cores, using the position information noted earlier. Then I swap the cores so the new index is online. I do need to examine sharding and multiple cores. I'll look into that, thank you. By the way, don't google for DIH! It took me some time to figure out that it is DataImportHandler, as some people use the acronym for something completely different. To adapt this for SolrCloud, I would need to use two collections, and update a collection alias for what is considered live. To control the I/O and CPU usage, you might need some kind of throttling in your update/rebuild application. I don't need any throttling in my design. Because I'm using DIH, the import only uses a single thread for each shard on the server. I've got RAID10 for storage and half of the CPU cores are still available for queries, so it doesn't overwhelm the server. The rebuild does lower performance, so I have the other copy of the index handle queries while the rebuild is underway. When the rebuild is done on one copy, I run it again on the other copy. Right now I'm half-upgraded -- one copy of my index is version 3.5.0, the other is 4.2.1. Switching to SolrCloud with sharding and replication would eliminate this flexibility, unless I maintained two separate clouds. Thank you. I am not using Solr Cloud but if I ever consider it, then I will keep this in mind. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Removing a single value from a multiValue field
First, you cannot do any internal editing of a multi-valued list, other than: 1. Replace the entire list. 2. Add values on to the end of the list. But you can do both of those operations on a single multivalued field with atomic update without reading and writing the entire document. Second, there is no arr element in the Solr Update XML format. Only field. To simply replace the full, current value of one multi-valued field: add doc field name=iddoc-id/field field name=tags update=seta/field field name=tags update=setb/field /doc /add If you simply want to append a couple of values: add doc field name=iddoc-id/field field name=tags update=adda/field field name=tags update=addb/field /doc /add To empty out a multivalued field: add doc field name=iddoc-id/field field name=tags update=set null=true / /doc /add -- Jack Krupansky -Original Message- From: Dotan Cohen Sent: Thursday, May 30, 2013 7:55 AM To: solr-user@lucene.apache.org Subject: Removing a single value from a multiValue field I have a Solr application with a multiValue field 'tags'. All fields are indexed in this application. There exists a uniqueKey field 'id' and a '_version_' field. This is running on Solr 4.x. In order to add a tag, the application retrieves the full document, creates a PHP array from the document structure, removes the '_version_' field, and then adds the appropriate tag to the 'tags' array. This is all then sent to Solr's update method via HTTP with 'overwrite=true'. Solr correctly replaces the extant document with the new document, which is identical with the exception of a new value for the '_version_' field and an additional value in the multiValued field 'tags'. This all works correctly. I am now adding a feature where one can remove tags. I am using the same business logic, however instead of adding a value to the 'tags' array I am removing one. I can confirm that the data being sent to Solr does not contain the removed tag. However, it seems that the old value for the multiValue field is persisted, that is the old tag stays. I can see that the '_version_' field has a new value, so I see that the change was properly commited. Is there a known bug that overwriting such a doc...: doc arr name=tags stra/str strb/str /arr /doc ...with this doc...: doc arr name=tags stra/str /arr /doc ...has no effect? Can multiValue fields be only added, but not removed? Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Sorting results by last update date
I wrote Otherwise, it would miss dates after the start of today, but that should be Otherwise, it would miss documents with times after the start of today if the current time is before noon. But use * and you will be better off anyway. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Thursday, May 30, 2013 8:27 AM To: solr-user@lucene.apache.org Subject: Re: Sorting results by last update date You can just use NOW/DAY for a filter that would only change once a day: [NOW/DAY-60DAY TO NOW/DAY] Oops... make that: [NOW/DAY-60DAY TO NOW/DAY+1DAY] Otherwise, it would miss dates after the start of today. Even better, make it: [NOW/DAY-60DAY TO *] -- Jack Krupansky -Original Message- From: Kamal Palei Sent: Thursday, May 30, 2013 5:41 AM To: solr-user@lucene.apache.org Subject: Re: Sorting results by last update date Thanks Shalini... It is solr 3.6.2 Instead of NOW, I can use today's date (I did not know this cache issue,, thanks). Later I realized , it looks it is my mistake that misleads asc and desc ordering result. After I get data from solr, again I do mysql query where the order changes again. Regards Kamal On Wed, May 29, 2013 at 2:54 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, May 29, 2013 at 12:10 PM, Kamal Palei palei.ka...@gmail.com wrote: Hi All I am trying to sort the results as per last updated date. My url looks as below. *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO 588]fq=salary:[0 TO 500] OR salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java +sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java +sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json json.nl=mapsort=last_updated_date asc * With this I get the data in ascending order of last updated date. If I am trying to sort data in descending order, I use below url *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO 588]fq=salary:[0 TO 500] OR salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java +sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java +sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json json.nl=mapsort=last_updated_date desc* Here the data set is not ordered properly, mostly it looks to me data is ordered on basis of score, not last updated date. Can somebody tell me what I am missing here, why *desc* is not working properly for me. What is the field type of last_update_date? Which version of Solr? A side note: Using NOW in a filter query is ineffecient because it doesn't use your filter cache effectively. Round it to nearest time interval instead. See http://java.dzone.com/articles/solr-date-math-now-and-filter -- Regards, Shalin Shekhar Mangar.
Re: Problem with xpath expression in data-config.xml
Ah, I missed that part. The problem that you have is because you have forEach=/feed/entry but you want to read /feed/link as a common field. You need to have forEach=/feed | /feed/entry which should let you have both /feed/link as well as /feed/entry/link. On Thu, May 30, 2013 at 1:25 PM, Hans-Peter Stricker stric...@epublius.dewrote: Thanks for having analyzed the problem. But please let me note that I came to a somehow different conclusion. Define for the moment title to be the primary unique key: solr-4.3.0\example\example-DIH\solr\rss\conf\schema.xml uniqueKeytitle/uniqueKey solr-4.3.0\example\example-DIH\solr\rss\conf\rss-data-config.xml [BAD CASE] (irrespective of the predicate @rel='self') dataConfig dataSource type=URLDataSource / document entity name=beautybooks88 pk=title url= http://beautybooks88.blogspot.com/feeds/posts/default; processor=XPathEntityProcessor forEach=/feed/entry transformer=DateFormatTransformer field column=title xpath=/feed/entry/title / field column=source-link xpath=/feed/link[@rel='self']/@href commonField=true / /entity /document /dataConfig [GOOD CASE] dataConfig dataSource type=URLDataSource / document entity name=beautybooks88 pk=title url= http://beautybooks88.blogspot.com/feeds/posts/default; processor=XPathEntityProcessor forEach=/feed/entry transformer=DateFormatTransformer field column=title xpath=/feed/entry/title / field column=link xpath=/feed/entry/link[@rel='self']/@href / /entity /document /dataConfig Conclusion: It has nothing to do with the number of occurrences of the pattern. -- Regards, Shalin Shekhar Mangar.
Re: Removing a single value from a multiValue field
On Thu, May 30, 2013 at 3:42 PM, Jack Krupansky j...@basetechnology.com wrote: First, you cannot do any internal editing of a multi-valued list, other than: 1. Replace the entire list. 2. Add values on to the end of the list. Thank you. I meant that I am actually editing the entire document. Reading it, changing the values that I need, and then 'updating' it. I will look into updating only the single multiValued field. But you can do both of those operations on a single multivalued field with atomic update without reading and writing the entire document. Second, there is no arr element in the Solr Update XML format. Only field. To simply replace the full, current value of one multi-valued field: add doc field name=iddoc-id/field field name=tags update=seta/field field name=tags update=setb/field /doc /add If you simply want to append a couple of values: add doc field name=iddoc-id/field field name=tags update=adda/field field name=tags update=addb/field /doc /add To empty out a multivalued field: add doc field name=iddoc-id/field field name=tags update=set null=true / /doc /add Thank you. I will see about translating that into the JSON format that I work with. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Removing a single value from a multiValue field
You gave an XML example, so I assumed you were working with XML! In JSON... [{id: doc-id, tags: {add: [a, b]}] and [{id: doc-id, tags: {set: null}}] BTW, this kind of stuff is covered in the book, separate chapters for XML and JSON, each with dozens of examples like this. -- Jack Krupansky -Original Message- From: Dotan Cohen Sent: Thursday, May 30, 2013 9:36 AM To: solr-user@lucene.apache.org Subject: Re: Removing a single value from a multiValue field On Thu, May 30, 2013 at 3:42 PM, Jack Krupansky j...@basetechnology.com wrote: First, you cannot do any internal editing of a multi-valued list, other than: 1. Replace the entire list. 2. Add values on to the end of the list. Thank you. I meant that I am actually editing the entire document. Reading it, changing the values that I need, and then 'updating' it. I will look into updating only the single multiValued field. But you can do both of those operations on a single multivalued field with atomic update without reading and writing the entire document. Second, there is no arr element in the Solr Update XML format. Only field. To simply replace the full, current value of one multi-valued field: add doc field name=iddoc-id/field field name=tags update=seta/field field name=tags update=setb/field /doc /add If you simply want to append a couple of values: add doc field name=iddoc-id/field field name=tags update=adda/field field name=tags update=addb/field /doc /add To empty out a multivalued field: add doc field name=iddoc-id/field field name=tags update=set null=true / /doc /add Thank you. I will see about translating that into the JSON format that I work with. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Solr 4.3, Tomcat, Error filterStart
I am trying to get Solr installed in Tomcat, and having trouble. I am trying to use the instructions at http://wiki.apache.org/solr/SolrTomcat as a guide. Trying to start with the example Solr from the Solr distro. Tried using the Tried with both a binary distro with existing solr.war, and with compiling my own solr.war. * Solr 4.3.0 * Tomcat 6.0.29 * JVM 1.6 When I start up tomcat, I get in the Tomcat log: INFO: Deploying web application archive solr.war May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr] startup failed due to previous errors And solr is not actually deployed, naturally. I've tried to google for advice on this -- mostly what I found was suggestions for how to turn up logging to get more info (maybe a stack trace?) to give you more clues what's failing -- but nothing I found suggested succesfully worked to turn up logging. So I'm at a bit of a loss. Any suggestions? Any ideas what might be causing this error, and/or how to get more information on what's causing it?
Re: Solr 4.3, Tomcat, Error filterStart
Hi Jonathan, Did you find http://stackoverflow.com/questions/3016808/tomcat-startup-logs-severe-error-filterstart-how-to-get-a-stack-trace ? Steve On May 30, 2013, at 10:10 AM, Jonathan Rochkind rochk...@jhu.edu wrote: I am trying to get Solr installed in Tomcat, and having trouble. I am trying to use the instructions at http://wiki.apache.org/solr/SolrTomcat as a guide. Trying to start with the example Solr from the Solr distro. Tried using the Tried with both a binary distro with existing solr.war, and with compiling my own solr.war. * Solr 4.3.0 * Tomcat 6.0.29 * JVM 1.6 When I start up tomcat, I get in the Tomcat log: INFO: Deploying web application archive solr.war May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr] startup failed due to previous errors And solr is not actually deployed, naturally. I've tried to google for advice on this -- mostly what I found was suggestions for how to turn up logging to get more info (maybe a stack trace?) to give you more clues what's failing -- but nothing I found suggested succesfully worked to turn up logging. So I'm at a bit of a loss. Any suggestions? Any ideas what might be causing this error, and/or how to get more information on what's causing it?
Re: Solr 4.3, Tomcat, Error filterStart
Usually tomcat errors with Solr 4.3 happen due to uncopied logging libraries. I would check if installing Solr 4.2.1 works and/or copy additional libraries in (search mailing list for this issue). However, I am not entirely sure that's the case here. It feels that perhaps the definition of the handler could be a bigger issue here. I assume you have an xml file somewhere that defines that /solr maps to solr.war. I would double check that. Maybe try to deploy something smaller and easier and see what the difference is. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 30, 2013 at 10:10 AM, Jonathan Rochkind rochk...@jhu.edu wrote: I am trying to get Solr installed in Tomcat, and having trouble. I am trying to use the instructions at http://wiki.apache.org/solr/SolrTomcat as a guide. Trying to start with the example Solr from the Solr distro. Tried using the Tried with both a binary distro with existing solr.war, and with compiling my own solr.war. * Solr 4.3.0 * Tomcat 6.0.29 * JVM 1.6 When I start up tomcat, I get in the Tomcat log: INFO: Deploying web application archive solr.war May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr] startup failed due to previous errors And solr is not actually deployed, naturally. I've tried to google for advice on this -- mostly what I found was suggestions for how to turn up logging to get more info (maybe a stack trace?) to give you more clues what's failing -- but nothing I found suggested succesfully worked to turn up logging. So I'm at a bit of a loss. Any suggestions? Any ideas what might be causing this error, and/or how to get more information on what's causing it?
Fwd: indexing only selected fields
-- Forwarded message -- From: Igor Littig igor.lit...@gmail.com Date: 2013/5/30 Subject: indexing only selected fields To: solr-user-...@lucene.apache.org Hello everyone. I'm quite new in Solr and need your advice... Does anybody know how to index not all fields in an uploading document but only those which I mentioned in the schema, others fields and symbols just ignore. Is it possible ???
Re: Solr 4.3, Tomcat, Error filterStart
I am trying to get Solr installed in Tomcat, and having trouble. When I start up tomcat, I get in the Tomcat log: INFO: Deploying web application archive solr.war May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr] startup failed due to previous errors I've tried to google for advice on this -- mostly what I found was suggestions for how to turn up logging to get more info In a cruel twist of fate, it is actually logging changes that are preventing Solr from starting. The required steps for deploying 4.3 changed. I will update the wiki page about tomcat when I'm not on a train. See this page for additional instructions, specifically the section about deploying on containers other than jetty: http://wiki.apache.org/solr/SolrLogging Thanks, Shawn
Re: indexing only selected fields
How are you submitting your document? Some methods automatically ignore unknown fields, other complaint. In any case, there is always a way to define an ignored field type. The schema.xml in the main example shows how to do it. Search for 'ignored'. But beware that this will hide all spelling and other errors later.. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 30, 2013 at 10:39 AM, Igor Littig igor.lit...@gmail.com wrote: -- Forwarded message -- From: Igor Littig igor.lit...@gmail.com Date: 2013/5/30 Subject: indexing only selected fields To: solr-user-...@lucene.apache.org Hello everyone. I'm quite new in Solr and need your advice... Does anybody know how to index not all fields in an uploading document but only those which I mentioned in the schema, others fields and symbols just ignore. Is it possible ???
Re: Fwd: indexing only selected fields
-- Forwarded message -- From: Igor Littig igor.lit...@gmail.com Date: 2013/5/30 Subject: indexing only selected fields To: solr-user-...@lucene.apache.org Hello everyone. I'm quite new in Solr and need your advice... Does anybody know how to index not all fields in an uploading document but only those which I mentioned in the schema, others fields and symbols just ignore. Is it possible ??? This should be exactly how Solr works. The only way that you would get fields not explicitly mentioned in your schema is if they match a dynamic field wildcard ... but that would also be in your schema, so it doesn't change what I'm saying. Thanks, Shawn
Re: indexing only selected fields
Alex Thank you for the answer. I am submitting by POST method via curl... For example when I want to submit a document I'm typing in the command line: curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary @ base.info -H 'Content-type:application/json' where base.info my file with information which I want to index. Could you in which ways(methods) I can automatically omit unknown fields. It would be easier to select only needed fields. Cheers Igor 2013/5/30 Alexandre Rafalovitch arafa...@gmail.com How are you submitting your document? Some methods automatically ignore unknown fields, other complaint. In any case, there is always a way to define an ignored field type. The schema.xml in the main example shows how to do it. Search for 'ignored'. But beware that this will hide all spelling and other errors later.. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 30, 2013 at 10:39 AM, Igor Littig igor.lit...@gmail.com wrote: -- Forwarded message -- From: Igor Littig igor.lit...@gmail.com Date: 2013/5/30 Subject: indexing only selected fields To: solr-user-...@lucene.apache.org Hello everyone. I'm quite new in Solr and need your advice... Does anybody know how to index not all fields in an uploading document but only those which I mentioned in the schema, others fields and symbols just ignore. Is it possible ???
RE: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor
I don't want to dissuade you from trying but I believe FileListEntityProcessor has something special coded up into it to allow for its unique usage. Not sure if your approach isn't do-able. I would imagine that fixing FLEP to handle a row-at-a-time or page-at-a-time in memory wouldn't be terribly hard, but haven't looked either. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Thursday, May 30, 2013 6:08 AM To: solr-user@lucene.apache.org Subject: Re: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor Did you declare that field name in outer entity? Not just select as in the query. Regards, Alex On 30 May 2013 04:31, jerome.dup...@bnf.fr wrote: Hello, I want to use a index a huge list of xml file. _ Using FileListEntityProcessor causes an OutOfMemoryException (too many files...) _ I can do it using a LineEntityProcessor reading a list of files, generated externally, but I would prefer to generate the list in SOLR _ So to avoid to mantain a list of files, I'm trying to generate the list with an sql query, and to give the list of results to XPathEntityProcessor, which will read the file The query select DISTINCT... generate this result CHEMINRELATIF 3/0/000/3001 But the problem is that with the following configuration, no request do db is done, accoring to the message returned by DIH. statusMessages:{ Total Requests made to DataSource:0, Total Rows Fetched:0, Total Documents Processed:0, Total Documents Skipped:0, :Indexing completed. Added/Updated: 0 documents. Deleted 0 documents., Committed:2013-05-30 10:23:30, Optimized:2013-05-30 10:23:30, And the log: INFO 2013-05-30 10:23:29,924 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (121) - Loading DIH Configuration: mnb-data-config.xml INFO 2013-05-30 10:23:29,957 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (224) - Data Configuration loaded successfully INFO 2013-05-30 10:23:29,969 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (414) - Starting Full Import INFO 2013-05-30 10:23:30,009 http-8080-1 org.apache.solr.handler.dataimport.SimplePropertiesWriter (219) - Read dataimportMNb.properties INFO 2013-05-30 10:23:30,045 http-8080-1 org.apache.solr.handler.dataimport.DocBuilder (292) - Import completed successfully Did some has already done the kind of configuration, or is just not possible? The config: dataConfig dataSource name=accesPCN type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@mymachine:myport:mydb user=myuser password=mypasswd readOnly=true/ document entity name=requeteurNomsFichiersNotices datasource=accesPCN processor=SqlEntityProcessor query=select DISTINCT... SUBSTR( to_char(noticebib.numnoticebib, '9'), 3, 1) || '/' || SUBSTR( to_char(noticebib.numnoticebib, '9'), 4, 1) || '/' || SUBSTR( to_char(noticebib.numnoticebib, '9'), 5, 3) || '/' || to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF from bnf.noticebib where numnoticebib = '3001' transformer=LogTransformer logTemplate=In entity requeteurNomsFichiersNotices logLevel=debug entity name=processorDocument processor=XPathEntityProcessor url=file:///D:/jed/noticesBib/$ {accesPCN.CHEMINRELATIF} xsl=xslt/mnb/IXM_MNb.xsl forEach=/record transformer=LogTransformer,fr.bnf.solr.BnfDateTransformer logTemplate=Notice fichier: $ {accesPCN.CHEMINRELATIF} logLevel=debug datasource=accesPCN I'm trying to inde Cordialement, --- Jérôme Dupont Bibliothèque Nationale de France Département des Systèmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Exposition Guy Debord, un art de la guerre - du 27 mars au 13 juillet 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
Re: Fwd: indexing only selected fields
Update Request Processors to the rescue! Example - Ignore input values for any undefined fields Add to solrconfig: updateRequestProcessorChain name=ignore-undefined processor class=solr.IgnoreFieldUpdateProcessorFactory / processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Index content: curl http://localhost:8983/solr/update?commit=trueupdate.chain=ignore-undefined; \ -H 'Content-type:application/json' -d ' [{id: doc-1, title: Hello World, features: [Fast, Cheap], bad_field_name: Junk, abstract: Not in schema either}]' Results: id:doc-1, title:[Hello World], features:[Fast, Cheap], (From the book!) -- Jack Krupansky -Original Message- From: Igor Littig Sent: Thursday, May 30, 2013 10:39 AM To: solr-user@lucene.apache.org Subject: Fwd: indexing only selected fields -- Forwarded message -- From: Igor Littig igor.lit...@gmail.com Date: 2013/5/30 Subject: indexing only selected fields To: solr-user-...@lucene.apache.org Hello everyone. I'm quite new in Solr and need your advice... Does anybody know how to index not all fields in an uploading document but only those which I mentioned in the schema, others fields and symbols just ignore. Is it possible ???
Re: indexing only selected fields
If you want to just removing anything that does not match then 'ignored' field type in example schema would work. If you want to ignore specific fields but complain on any unexpected things you can still use specific fields but with ignored type. Or you could use Update Request Processors like this one: http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/IgnoreFieldUpdateProcessorFactory.html Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 30, 2013 at 10:55 AM, Igor Littig igor.lit...@gmail.com wrote: Alex Thank you for the answer. I am submitting by POST method via curl... For example when I want to submit a document I'm typing in the command line: curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary @ base.info -H 'Content-type:application/json' where base.info my file with information which I want to index. Could you in which ways(methods) I can automatically omit unknown fields. It would be easier to select only needed fields. Cheers Igor 2013/5/30 Alexandre Rafalovitch arafa...@gmail.com How are you submitting your document? Some methods automatically ignore unknown fields, other complaint. In any case, there is always a way to define an ignored field type. The schema.xml in the main example shows how to do it. Search for 'ignored'. But beware that this will hide all spelling and other errors later.. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 30, 2013 at 10:39 AM, Igor Littig igor.lit...@gmail.com wrote: -- Forwarded message -- From: Igor Littig igor.lit...@gmail.com Date: 2013/5/30 Subject: indexing only selected fields To: solr-user-...@lucene.apache.org Hello everyone. I'm quite new in Solr and need your advice... Does anybody know how to index not all fields in an uploading document but only those which I mentioned in the schema, others fields and symbols just ignore. Is it possible ???
Re: Solr 4.3, Tomcat, Error filterStart
Thanks! I guess I should have asked on-list BEFORE wasting 4 hours fighting with it myself, but I was trying to be a good user and do my homework! Oh well. Off to the logging instructions, hope I can figure them out -- if you could update the tomcat instructions with the simplest possible way to get deploy in Tomcat to work, that'd def be helpful! On 5/30/2013 10:41 AM, Shawn Heisey wrote: I am trying to get Solr installed in Tomcat, and having trouble. When I start up tomcat, I get in the Tomcat log: INFO: Deploying web application archive solr.war May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr] startup failed due to previous errors I've tried to google for advice on this -- mostly what I found was suggestions for how to turn up logging to get more info In a cruel twist of fate, it is actually logging changes that are preventing Solr from starting. The required steps for deploying 4.3 changed. I will update the wiki page about tomcat when I'm not on a train. See this page for additional instructions, specifically the section about deploying on containers other than jetty: http://wiki.apache.org/solr/SolrLogging Thanks, Shawn
Re: SPLITSHARD: time out error
Shard splitting is buggy in 4.3. I recommend that you wait for the next release (4.3.1) before using this feature. That being said, the split is executed by the Overseer and will continue to happen even after the http request times out. There aren't enough hooks to monitor the progress of the operation. You can look at ZooKeeper clusterstate to see if the sub shards are up and running. In your case, the sub shards will be called 00_0 and 00_1 and should be in active state (both shardState and state attribute in zk should be active). On Thu, May 30, 2013 at 4:46 PM, yriveiro yago.rive...@gmail.com wrote: Hi, I have a time out error when I try to split a collection with 15M documents The exception (solr version 4.3): 542468 [catalina-exec-27] INFO org.apache.solr.servlet.SolrDispatchFilter – [admin] webapp=null path=/admin/collections params={shard=00action=SPLITSHARDcollection=ST-0112_replicated} status=500 QTime=300028 542469 [catalina-exec-27] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:org.apache.solr.common.SolrException: splitshard the collection time out:300s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:166) at org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:300) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:608) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 582557 [catalina-exec-39] INFO org.apache.solr.update.SolrIndexSplitter – SolrIndexSplitter: partition #1 582561 [catalina-exec-39] INFO org.apache.solr.core.SolrCore – SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/disk2/node00.solrcloud/solr/home/0112_replicated_00_1_replica1/data/index,segFN=segments_1,generation=1,filenames=[segments_1] 582563 [catalina-exec-39] INFO org.apache.solr.core.SolrCore – newest commit = 1[segments_1] How I can split my collection without this error? - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/SPLITSHARD-time-out-error-tp4066991.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Solr 4.3, Tomcat, Error filterStart
I'm going to add a note to http://wiki.apache.org/solr/SolrLogging , with the Tomcat sample Error filterStart error, as an example of something you might see if you have not set up logging. Then at least in the future, googling solr tomcat error filterStart might lead someone to the clue that it might be logging. On 5/30/2013 10:41 AM, Shawn Heisey wrote: I am trying to get Solr installed in Tomcat, and having trouble. When I start up tomcat, I get in the Tomcat log: INFO: Deploying web application archive solr.war May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr] startup failed due to previous errors I've tried to google for advice on this -- mostly what I found was suggestions for how to turn up logging to get more info In a cruel twist of fate, it is actually logging changes that are preventing Solr from starting. The required steps for deploying 4.3 changed. I will update the wiki page about tomcat when I'm not on a train. See this page for additional instructions, specifically the section about deploying on containers other than jetty: http://wiki.apache.org/solr/SolrLogging Thanks, Shawn
Re: Solr 4.3, Tomcat, Error filterStart
On 5/30/2013 9:26 AM, Jonathan Rochkind wrote: Thanks! I guess I should have asked on-list BEFORE wasting 4 hours fighting with it myself, but I was trying to be a good user and do my homework! Oh well. Off to the logging instructions, hope I can figure them out -- if you could update the tomcat instructions with the simplest possible way to get deploy in Tomcat to work, that'd def be helpful! Commute done. I'm not a tomcat user, so the only thing I know about where to drop those jars and properties file is tomcat/lib ... do you have anything more specific that I can include in the wiki page? In particular, I'd like to know if there are any particular config files or other specific information I can list to help the reader locate where tomcat/lib lives. I suppose I can put what I do know and let someone with better knowledge update it. Thanks, Shawn
Re: Re: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor
Hi, Thanks for your anwser, it made me go ahead. The name of the entity was not good, not consistent with schema Now the first entity works fine: the query is done to the database and returns the good result. The problem is that the second entity, which is a XPathEntityProcessor entity, doesn't read the file specified in url attribute, but tries to execute it as an sql query on my database. I tried to put a fake query (select 1 from dual) but it changes nothing. It's like the XPathEntityProcessor entity behaved like an SqlEntityProcessor, using url attribute instead of query attrbute. I've forgotten to say which version I use: SOLR 4.2.1 (can be changed, it's just the beginning of the developpement) See next the config, and the return message: The verbose output: verbose-output:[ entity:noticebib,[ query,select DISTINCT SUBSTR( to_char(noticebib.numnoticebib, '9'), 3, 1) || '/' ||SUBSTR( to_char(noticebib.numnoticebib, '9'), 4, 1) || '/' ||SUBSTR( to_char(noticebib.numnoticebib, '9'), 5, 3) || '/' ||to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF from bnf.noticebibwhere numnoticebib = '3001', time-taken,0:0:0.141, null,--- row #1-, CHEMINRELATIF,3/0/000/3001.xml, null,-, entity:processorDocument,[ document#1,[ query,file:///D:/jed/noticesbib/3/0/000/3001.xml, EXCEPTION,org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: file:///D:/jed/noticesbib/3/0/000/3001.xml Processing Document # 1\r\n\tat org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow (DataImportHandlerException.java:71)\r\n\tat ... oracle.jdbc.driver.OracleStatementWrapper.execute (OracleStatementWrapper.java:1203)\r\n\tat org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init (JdbcDataSource.java:246)\r\n\t... 32 more\r\n, time-taken,0:0:0.124, This is the configuration dataSource name=accesPCN my oracle ds definition.../ dataSource name=racineNoticeDatasource baseUrl=file:///D:/jed/noticesBib type=URLDataSource encoding=UTF-8/ document entity name=noticebib datasource=accesPCN processor=SqlEntityProcessor rootEntity=false query=select DISTINCT SUBSTR( to_char(noticebib.numnoticebib, '9'), 3, 1) || '/' || SUBSTR( to_char(noticebib.numnoticebib, '9'), 4, 1) || '/' || SUBSTR( to_char(noticebib.numnoticebib, '9'), 5, 3) || '/' || to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF from bnf.noticebib where numnoticebib = '3001' field column=CHEMINRELATIF name=CHEMINRELATIF / entity name=processorDocument processor=XPathEntityProcessor datasource=racineNoticeDatasource url=file:///D:/jed/noticesbib/$ {noticebib.CHEMINRELATIF} query=SELECT 1 from dual xsl=xslt/mnb/IXM_MNb.xsl forEach=/record transformer=LogTransformer,fr.bnf.solr.BnfDateTransformer logTemplate=Notice fichier: $ {noticebib.CHEMINRELATIF} logLevel=debug Cordialement, --- Jérôme Dupont Bibliothèque Nationale de France Département des Systèmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Exposition Guy Debord, un art de la guerre - du 27 mars au 13 juillet 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
Pivot Facets refining datetime, bleh
I've been trying to get into how distributed field facets do their work but I haven't been able to uncover how they deal with this issue. Currently distrib pivot facets does a getTermCounts(first_field) to populate a list at the level its working on. When putting together the data structure we set up a BytesRef, fill it in with the value using the FieldType.ReadableToIndexed call and then add the FieldType.ToObject of that bytesRef and associated field. --From getTermCounts comes fieldValue-- termval = new BytesRef(); ftype.readableToIndexed(fieldValue, termval); pivot.add( value, ftype.toObject(sfield, termval) ); This works great for everything but datetime, as datetime's .ToObject turns it into a human readable string that is unconvertable -at least in my investigation. I've tried to use the FieldType.ToInternal but that also fails on the human readable datetime format. My original idea was to skip the aformentioned block of code and just straight add the fieldValue to the data structure. This caused some pivot facet tests to return wonky results, I'm not sure if I should go down the path of trying to figure out those problems or if there is a different approach I should be taking. Any general guidance on how distributed field facets deals with this would be much appreciated.
Re: Re: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor
On Thu, May 30, 2013 at 11:44 AM, jerome.dup...@bnf.fr wrote: entity name=processorDocument processor=XPathEntityProcessor datasource=racineNoticeDatasource url=file:///D:/jed/noticesbib/$ {noticebib.CHEMINRELATIF} I've seen this one before. 'dataSource' is case sensitive, you said 'datasource'. DIH does not complain but instead just picks up the default (first?) processor which happens to be SQL one. Change one letter, see if it fixes it. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: indexing only selected fields
Ok, that is clear. Thanks fo the answer 2013/5/30 Alexandre Rafalovitch arafa...@gmail.com If you want to just removing anything that does not match then 'ignored' field type in example schema would work. If you want to ignore specific fields but complain on any unexpected things you can still use specific fields but with ignored type. Or you could use Update Request Processors like this one: http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/IgnoreFieldUpdateProcessorFactory.html Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 30, 2013 at 10:55 AM, Igor Littig igor.lit...@gmail.com wrote: Alex Thank you for the answer. I am submitting by POST method via curl... For example when I want to submit a document I'm typing in the command line: curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary @ base.info -H 'Content-type:application/json' where base.info my file with information which I want to index. Could you in which ways(methods) I can automatically omit unknown fields. It would be easier to select only needed fields. Cheers Igor 2013/5/30 Alexandre Rafalovitch arafa...@gmail.com How are you submitting your document? Some methods automatically ignore unknown fields, other complaint. In any case, there is always a way to define an ignored field type. The schema.xml in the main example shows how to do it. Search for 'ignored'. But beware that this will hide all spelling and other errors later.. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 30, 2013 at 10:39 AM, Igor Littig igor.lit...@gmail.com wrote: -- Forwarded message -- From: Igor Littig igor.lit...@gmail.com Date: 2013/5/30 Subject: indexing only selected fields To: solr-user-...@lucene.apache.org Hello everyone. I'm quite new in Solr and need your advice... Does anybody know how to index not all fields in an uploading document but only those which I mentioned in the schema, others fields and symbols just ignore. Is it possible ???
Rollback from Solr4.2.1 to Solr3.5
Hi, We recently had production release to upgrade our Solr3.5 to Solr 4.2.1. (No schema change except the some basic required for 4.2.1) The nature of our document is that we have huge multivalued fields. they can go from 1000 to 100K in once single field. # Documents : 300K # Index size: 9GB (all fields are stored and 5 are index) # JVM Heap: 4GB We haven't seen more than 10% CPU and 60% JVM heap during our usage where we get 7K to 10K request per min for this server. Upgrading to 4.2.1 we saw the CPU spiked to constant 75% and heap usage grew to 95% within 5 mins of traffic. Later the server becomes slow to unresponsive and we start seeing connection timeouts Did couple of adjustments to JVM heap but still couldn't get it resolved. and had to rollback to 3.5 as we were exceeding out deployment window. During our investigation we identified that the queries which are causing the problem is the one which is fetching the huge multivalued fields. Decompressing is killing the server. I have reported this issue earlier which happened to be fixed in 4.2.1 but not sure if there is another side effect of compressed field that still remains. Any advice is much appreciated. thanks Aditya -- View this message in context: http://lucene.472066.n3.nabble.com/Rollback-from-Solr4-2-1-to-Solr3-5-tp4067094.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: java.lang.IllegalAccessError when invoking protected method from another class in the same package path but different jar.
Hoss, thanks a lot for the explanation. We override most of the methods of query component(prepare,handleResponses,finishStage etc..) to incorporate custom logic and we set the _responseDocs values based on custom logic (after filtering out few data) and then we call the parent(super) method(query component) with the modified responsedocs. Thats the main reason we are using the _responsedocs variable as is.. -- View this message in context: http://lucene.472066.n3.nabble.com/java-lang-IllegalAccessError-when-invoking-protected-method-from-another-class-in-the-same-package-p-tp4066904p4067086.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 4.3: write.lock is not removed
How are you indexing the documents? Are you using indexing program? The below post discusses the same issue.. http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html -- View this message in context: http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html Sent from the Solr - User mailing list archive at Nabble.com.
Continue Indexing Documents when single doc does not match schema
I am using Nutch 1.6 and Solr 1.4.1 on Ubuntu in local mode and using Nutch's solrindex to index documents into Solr. When indexing documents, I hit an occasional document that does not match the Solr schema. For example, a document which has two address fields when my Solr schema.xml does not specify address as being multi-valued (and I do not want it to be). Ideally, I would like this document to be skipped, an error written to the log file for later investigation, and the indexing of the remainder of the parsed documents to continue. Instead the job fails. I have tried setting abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnC onfigurationError in solrconfig.xml and restarting tomcat, but that does not seem to make a difference. Where else should I be looking?
solr 3.6 use only one CPU
We have a solr instance running on a 4 CPU box. Sometimes, we send a query to our solr server and it take up 100% of one CPU and 60% of memory. I assume that if we send another query request, solr should be able to use another idling CPU. However, it is not the case. Using top, I only see one cpu is busy, and the client side just gets stucked. Is solr 3.6 able to do multithreading to process requests? Ming-
Continue Indexing Documents when single doc does not match schema
I am using Nutch 1.6 and Solr 1.4.1 on Ubuntu in local mode and using Nutch's solrindex to index documents into Solr. When indexing documents, I hit an occasional document that does not match the Solr schema. For example, a document which has two address fields when my Solr schema.xml does not specify address as being multi-valued (and I do not want it to be). Ideally, I would like this document to be skipped, an error written to the log file for later investigation, and the indexing of the remainder of the parsed documents to continue. Instead the job fails. I have tried setting abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnC onfigurationError in solrconfig.xml and restarting tomcat, but that does not seem to make a difference. Where else should I be looking?
Re: Continue Indexing Documents when single doc does not match schema
On 5/30/2013 11:03 AM, Iain Lopata wrote: When indexing documents, I hit an occasional document that does not match the Solr schema. For example, a document which has two address fields when my Solr schema.xml does not specify address as being multi-valued (and I do not want it to be). Ideally, I would like this document to be skipped, an error written to the log file for later investigation, and the indexing of the remainder of the parsed documents to continue. Instead the job fails. I have tried setting abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnC onfigurationError in solrconfig.xml and restarting tomcat, but that does not seem to make a difference. That config option just tells Solr whether or not initial startup should fail if there's a configuration error in config files like solrconfig.xml. In most cases, you want it to be true. I don't think anything currently exists to do what you want. The feature request issue has been around for a long time, and it's had some relatively recent activity, at least compared to its creation date: https://issues.apache.org/jira/browse/SOLR-445 I haven't looked at the patch, but I would imagine that it just needs to be updated for the many source code changes since it was created, then examined to make sure it's correctly implemented. Thanks, Shawn
Re: solr 3.6 use only one CPU
On 5/30/2013 11:12 AM, Mingfeng Yang wrote: We have a solr instance running on a 4 CPU box. Sometimes, we send a query to our solr server and it take up 100% of one CPU and 60% of memory. I assume that if we send another query request, solr should be able to use another idling CPU. However, it is not the case. Using top, I only see one cpu is busy, and the client side just gets stucked. Is solr 3.6 able to do multithreading to process requests? Solr is completely multithreaded, and has been for as long as I've been using it, which started with version 1.4.0. If you only send it one request at a time, it will only use one CPU. Your client code must be multithreaded as well. I don't have enough information to tell you whether your server is sized appropriately for your index. Here's some general information: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Find rows within range of other rows
I need to do a query where I need to find all people who have done 2 events within a range. I currently log one row per an event. Example: Person,Date,ViewedUrl 1,2012May10,google.com 2,2012May10,yahoo.com 1,2012May13,yahoo.com 2,2012May13,google.com Sample request would be wanting to find all people who viewed yahoo.comwithin a week of viewing google.com, so I would want to return 1 group of values for person 1. Any idea's? Thanks, Mike
Re: Continue Indexing Documents when single doc does not match schema
On Thu, May 30, 2013 at 1:03 PM, Iain Lopata ilopa...@hotmail.com wrote: For example, a document which has two address fields when my Solr schema.xml does not specify address as being multi-valued (and I do not want it to be). No help on the core topic, but a workaround for the specific situation could be: http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/FirstFieldValueUpdateProcessorFactory.html Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
RE: solr 4.3: write.lock is not removed
Hi, We just use CURL from PHP code to submit indexing request, like: /update?commit=true.. This worked well in solr 3.6.1. I saw the link you showed and really appreciate (if no other choice I will change java source code but hope there is a better way..)? Thanks very much for helps, Lisheng -Original Message- From: bbarani [mailto:bbar...@gmail.com] Sent: Thursday, May 30, 2013 9:45 AM To: solr-user@lucene.apache.org Subject: Re: solr 4.3: write.lock is not removed How are you indexing the documents? Are you using indexing program? The below post discusses the same issue.. http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html -- View this message in context: http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: solr 4.3: write.lock is not removed
I did more tests and get more info: the basic setting is that we created core from PHP CURl API where we define: schema config instanceDir=my_solr_home dataDir=my_solr_home/data/new_collection_name In solr 3.6.1 we donot need to define schema/config because conf folder is not inside each collection. 1/ Indexing works OK but write.lock is not removed (we use /update?commit=true..) 2/ Shutdown tomcat, I saw write.lock is gone 3/ Restart Tomcat, indexed data was created at the instanceDir/data level, with some warning messages. It seems that in solr.xml, dataDir is not defined? Thanks very much for helps, Lisheng -Original Message- From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] Sent: Thursday, May 30, 2013 10:57 AM To: solr-user@lucene.apache.org Subject: RE: solr 4.3: write.lock is not removed Hi, We just use CURL from PHP code to submit indexing request, like: /update?commit=true.. This worked well in solr 3.6.1. I saw the link you showed and really appreciate (if no other choice I will change java source code but hope there is a better way..)? Thanks very much for helps, Lisheng -Original Message- From: bbarani [mailto:bbar...@gmail.com] Sent: Thursday, May 30, 2013 9:45 AM To: solr-user@lucene.apache.org Subject: Re: solr 4.3: write.lock is not removed How are you indexing the documents? Are you using indexing program? The below post discusses the same issue.. http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html -- View this message in context: http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multiple field join?
: My advice. Forget joins and try to write this in pure : Solr query language. The more you try to use Solr like : a database, the more you'll get into trouble. De-normalize : your data and try again. with that important caveat in mind, it is worth noting that what you are essentailly asking about is using multiple filters each containing a distinct join query... : outer_id IN (SELECT inner_id FROM collection1 where zzz = vvv) : and : outer_id2 IN (SELECT inner_id2 FROM collection1 where ttt = xxx) : and : outer_id3 IN (SELECT inner_id3 FROM collection1 where ppp = rrr) ?q=*:* fq={!join from=inner_id to=outer_id}zzz:vvv fq={!join from=inner_id2 to=outer_id2}ttt:xxx fq={!join from=inner_id3 to=outer_id3}ppp:rrr -Hoss
Re: solr 4.3: write.lock is not removed
: I recently upgraded solr from 3.6.1 to 4.3, it works well, but I noticed that after finishing : indexing : : write.lock : : is NOT removed. Later if I index again it still works OK. Only after I shutdown Tomcat : then write.lock is removed. This behavior caused some problem like I could not use luke : to observe indexed data. IIRC, This was an intentional change. In older versions of Solr the IndexWRiter was only opened if/when updates needed to be made, but that made it impossible to safely take advantage of some internal optimizations related to NRT IndexReader reloading, so the logic was modified to always keep the IndexWriter open as lon as the SolrCore is loaded. In general, your past behavior of pointing luke at a live solr index could have also produced problems if updates came into solr while luke had the write lock active. -Hoss
Re: Grouping results based on the field which matched the query
: I wanted to know if Solr has some functionality to group results based on : the field that matched the query. : : So if I have id, name and manufacturer in my document structure, I want to : know how many results are there because its manufacturer matched the q and : how many results are there because q matched the name field. there's a difference between *grouping* results by a query, and *counting* which sbset of your request match your query. in generla, it sounds like you are probably currently using something like dismax or edismax to search across multiple fields, ala... ? defType=dismax qf=name manufacturer q=user input if you want to count how many of each of those docs match the user input in either name or manuacturer, you can use facet.query and take advantage of local params to refer back to the users main query input... facet=true facet.query={!field f=manufacturer v=$q} facet.query={!field f=name v=$q} ...however i'ts important to note that those counts won't neccessary add upto your numFound because some docs may match on multiple fields ... you may also not get any counts if your main query string is something complex, in qhich case you may want to ignore the local param (v=$q) and explicitly specify what the various facet.queries are. likewise, if you truely want to *group* the results based on querying on a specific field, you can use group.query instead... https://wiki.apache.org/solr/FieldCollapsing -Hoss
Collections API Reload killing my cloud
Everytime I try to do a reload using the collections API my entire cloud goes down and I cannot search it. The solrconfig.xml and schema.xml are good because when I just restart tomcat everything works fine. Here is the output of the collections api reload command: 59155087 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Overseer Collection Processor: Get the message id:/overseer/collection-queue-work/qn-00 message:{ operation:reloadcollection, name:productindex} 59155098 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Executing Collection Cmd : action=RELOAD 59155099 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Collection Admin sending CoreAdmin cmd to solr-shard-1:8080/solr params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores 59155099 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Collection Admin sending CoreAdmin cmd to solr-shard-4:8080/solr params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores 59155100 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Collection Admin sending CoreAdmin cmd to solr-shard-2:8080/solr params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores 59155102 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Collection Admin sending CoreAdmin cmd to solr-shard-5:8080/solr params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores 59155103 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Collection Admin sending CoreAdmin cmd to solr-shard-3:8080/solr params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores 59155105 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Collection Admin sending CoreAdmin cmd to solr-shard-6:8080/solr params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores 59155108 [http-bio-8080-exec-7] INFO org.apache.solr.core.CoreContainer – Reloading SolrCore 'productindex' using instanceDir: /srv/solr/productindex 59155109 [http-bio-8080-exec-7] INFO org.apache.solr.cloud.ZkController – Check for collection zkNode:productindex 59155111 [http-bio-8080-exec-7] INFO org.apache.solr.cloud.ZkController – Collection zkNode exists 59155112 [http-bio-8080-exec-7] INFO org.apache.solr.cloud.ZkController – Load collection config from:/collections/productindex 59155114 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – new SolrResourceLoader for directory: '/srv/solr/productindex/' 59155166 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrConfig – Adding specified lib dirs to ClassLoader 59155167 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to classloader 59155168 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/metadata-extractor-2.6.2.jar' to classloader 59155168 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/apache-mime4j-core-0.7.2.jar' to classloader 59155168 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/vorbis-java-core-0.1.jar' to classloader 59155168 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/juniversalchardet-1.0.3.jar' to classloader 59155169 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/poi-3.8.jar' to classloader 59155169 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/rome-0.9.jar' to classloader 59155169 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/jdom-1.0.jar' to classloader 59155170 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/poi-ooxml-schemas-3.8.jar' to classloader 59155170 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/commons-compress-1.4.1.jar' to classloader 59155170 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/apache-mime4j-dom-0.7.2.jar' to classloader 59155170 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/icu4j-49.1.jar' to
Re: Solr 4.3, Tomcat, Error filterStart
Okay, sadly, i still can't get this to work. Following the instructions at: https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty I copied solr/example/lib/ext/*.jar into my tomcat's ./lib, and copied solr/example/resources/log4j.properties there too. The result is unchanged, when I start tomcat, it still says: May 30, 2013 3:15:00 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart May 30, 2013 3:15:00 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr] startup failed due to previous errors This is very frustrating. I have no way to even be sure this problem really is logging related, although it seems likely. But I feel like I'm just randomly moving chairs around and hoping the error will go away, and it does not. Is there anyone that has succesfully run Solr 4.3.0 in a Tomcat 6? Can we even confirm this is possible? Can anyone give me any other hints, especially does anyone have any idea how to get some more logging out of Tomcat, then the fairly useless Error filterSTart? The only reason I'm using tomcat is that we always have in our current Solr 1.4-based application, for reasons lost to time. I was hoping to upgrade to Solr 4.3, without simultaneously switching our infrastructure from tomcat to jetty, change one thing at a time. I suppose I might need to abandon that and switch to jetty too, but I'd rather not.
Re: Collections API Reload killing my cloud
https://issues.apache.org/jira/browse/SOLR-4805 - Mark On May 30, 2013, at 3:09 PM, davers dboych...@improvementdirect.com wrote: Everytime I try to do a reload using the collections API my entire cloud goes down and I cannot search it. The solrconfig.xml and schema.xml are good because when I just restart tomcat everything works fine. Here is the output of the collections api reload command: 59155087 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Overseer Collection Processor: Get the message id:/overseer/collection-queue-work/qn-00 message:{ operation:reloadcollection, name:productindex} 59155098 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Executing Collection Cmd : action=RELOAD 59155099 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Collection Admin sending CoreAdmin cmd to solr-shard-1:8080/solr params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores 59155099 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Collection Admin sending CoreAdmin cmd to solr-shard-4:8080/solr params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores 59155100 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Collection Admin sending CoreAdmin cmd to solr-shard-2:8080/solr params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores 59155102 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Collection Admin sending CoreAdmin cmd to solr-shard-5:8080/solr params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores 59155103 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Collection Admin sending CoreAdmin cmd to solr-shard-3:8080/solr params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores 59155105 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Collection Admin sending CoreAdmin cmd to solr-shard-6:8080/solr params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores 59155108 [http-bio-8080-exec-7] INFO org.apache.solr.core.CoreContainer – Reloading SolrCore 'productindex' using instanceDir: /srv/solr/productindex 59155109 [http-bio-8080-exec-7] INFO org.apache.solr.cloud.ZkController – Check for collection zkNode:productindex 59155111 [http-bio-8080-exec-7] INFO org.apache.solr.cloud.ZkController – Collection zkNode exists 59155112 [http-bio-8080-exec-7] INFO org.apache.solr.cloud.ZkController – Load collection config from:/collections/productindex 59155114 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – new SolrResourceLoader for directory: '/srv/solr/productindex/' 59155166 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrConfig – Adding specified lib dirs to ClassLoader 59155167 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to classloader 59155168 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/metadata-extractor-2.6.2.jar' to classloader 59155168 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/apache-mime4j-core-0.7.2.jar' to classloader 59155168 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/vorbis-java-core-0.1.jar' to classloader 59155168 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/juniversalchardet-1.0.3.jar' to classloader 59155169 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/poi-3.8.jar' to classloader 59155169 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/rome-0.9.jar' to classloader 59155169 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/jdom-1.0.jar' to classloader 59155170 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/poi-ooxml-schemas-3.8.jar' to classloader 59155170 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/srv/solr/contrib/extraction/lib/commons-compress-1.4.1.jar' to classloader 59155170 [http-bio-8080-exec-7] INFO org.apache.solr.core.SolrResourceLoader – Adding
Re: Collections API Reload killing my cloud
Is it possible that this has something do do with it? 59157032 [Thread-2] INFO org.apache.solr.cloud.Overseer – Update state numShards=null message={ numShards=null -- View this message in context: http://lucene.472066.n3.nabble.com/Collections-API-Reload-killing-my-cloud-tp4067141p4067151.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.3, Tomcat, Error filterStart
On 5/30/2013 1:19 PM, Jonathan Rochkind wrote: Okay, sadly, i still can't get this to work. Following the instructions at: https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty I copied solr/example/lib/ext/*.jar into my tomcat's ./lib, and copied solr/example/resources/log4j.properties there too. The result is unchanged, when I start tomcat, it still says: OK, at this point, you've got Solr's logging configured, but your tomcat log won't be used -- the default logging destination has changed to log4j. You might need to edit the log4j.properties file so that it points at a location that exists - the default is logs/solr.log, relative to the current working directory of the tomcat process. Once the log4j destination gets created properly, you can look there for Solr's logs, which will hopefully give you additional insight. If you want it to work with tomcat exactly how it did before, then you can go back to the old logging method (java.util.logging) with another section on that page: http://wiki.apache.org/solr/SolrLogging#Switching_from_Log4J_back_to_JUL_.28java.util.logging.29 Thanks, Shawn
SolrCloud running away with resources
I've set up a simple 10 node, 5 shard SolrCloud 4.3. I'm pushing just a few thousand documents into it. What I'm doing is rather write intensive 100x...more writes than reads. I've noticed that there seems to be an unbounded use of resources. I'm seeing a steadily increasing number of network connections (monitored via: netstat | wc -l, which return over 5,500 and growing about 50 per minute) and over 2,200 open file descriptors (as shown on the Solr dashboard). This seems like there is something not configured correctly. At some point, rather soon I'm afraid, I'll run out of resources. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-running-away-with-resources-tp4067154.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.3, Tomcat, Error filterStart
Okay, for posterity: I did manage to get it working. It WAS lack of the logging files. First, the only way I could manage to get Tomcat6 to log an actual stacktrace for the Error filterStart was to _delete_ my CATALINA_HOME/conf/logging.properties file. Apparently without this file at all, the default ends up being 'log everything'. And once that happened, it did confirm that the Error filterStart problem WAS an inability to find the logging jars. (And the stack trace was an exception from Solr with a nice message including the URL to the logging wiki page, nice one solr). Nothing I tried before in a fit of desperation deleting that file entirely worked to get the stack trace logged. Once confirmed that the problem really was not finding the logging jars, I could keep doing things and restarting and seeing if that was still the exception. And I found that for some reason, despite http://tomcat.apache.org/tomcat-6.0-doc/class-loader-howto.html suggesting that jars could be found in either CATALINA_BASE/lib (for me /opt/tomcat6/lib), OR CATALINA_BASE/lib (for me /usr/share/tomcat6/lib), in fact for whatever reason /opt/tomcat6/lib was being ignored, but /usr/share/tomcat6/lib worked. And now I succesfully have solr started in tomcat. I realize that these are all tomcat6 issues, not solr issues. But others trying to get solr started may have similar problems. Appreciate the tip that the Error filterStart was probably related to new solr 4.3.0 logging setup, which ended up confirmed. Jonathan On 5/30/2013 3:19 PM, Jonathan Rochkind wrote: Okay, sadly, i still can't get this to work. Following the instructions at: https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty I copied solr/example/lib/ext/*.jar into my tomcat's ./lib, and copied solr/example/resources/log4j.properties there too. The result is unchanged, when I start tomcat, it still says: May 30, 2013 3:15:00 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart May 30, 2013 3:15:00 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr] startup failed due to previous errors This is very frustrating. I have no way to even be sure this problem really is logging related, although it seems likely. But I feel like I'm just randomly moving chairs around and hoping the error will go away, and it does not. Is there anyone that has succesfully run Solr 4.3.0 in a Tomcat 6? Can we even confirm this is possible? Can anyone give me any other hints, especially does anyone have any idea how to get some more logging out of Tomcat, then the fairly useless Error filterSTart? The only reason I'm using tomcat is that we always have in our current Solr 1.4-based application, for reasons lost to time. I was hoping to upgrade to Solr 4.3, without simultaneously switching our infrastructure from tomcat to jetty, change one thing at a time. I suppose I might need to abandon that and switch to jetty too, but I'd rather not.
indexing documents
Good day everyone. I recently faced another problem. I've got a bunch of documents to index. The problem, that they in the same time database for another application. These documents stored in JSON format in the following scheme: { id: 10, name: dad 177, cat:[{ id:254, name:124 }] } When I'm trying to post them, I get the following error: ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: Unknown command: id [8] Is there a way to index these documents without changing ? How can i modify the schema or I need to do something else ?
Re: 2 VM setup for SOLRCLOUD?
Jamey, You will need a load balancer on the front end to direct traffic into one of your SolrCore entry points. It doesn't matter, technically, which one though you will find benefits to narrowing traffic to fewer (for purposes of better cache management). Internally SolrCloud will round-robin distribute requests to other shards once a query begins execution. But you do need an entry point externally to be defined through your load balancer. Hope this is useful! Jason On May 30, 2013, at 12:48 PM, James Dulin jdu...@crelate.com wrote: Working to setup SolrCloud in Windows Azure. I have read over the solr Cloud wiki, but am a little confused about some of the deployment options. I am attaching an image for what I am thinking we want to do. 2 VM’s that will have 2 shards spanning across them. 4 Nodes total across the two machines, and a zookeeper on each VM. I think this is feasible, but, I am a little confused about how each node knows how to respond to requests (do I need a load balancer in front, or can we just reference the “collection” etc.) Thanks! Jamey
2 VM setup for SOLRCLOUD?
Working to setup SolrCloud in Windows Azure. I have read over the solr Cloud wiki, but am a little confused about some of the deployment options. I am attaching an image for what I am thinking we want to do. 2 VM's that will have 2 shards spanning across them. 4 Nodes total across the two machines, and a zookeeper on each VM. I think this is feasible, but, I am a little confused about how each node knows how to respond to requests (do I need a load balancer in front, or can we just reference the collection etc.) [cid:image001.png@01CE5D4B.D617D6E0] Thanks! Jamey
RE: solr 4.3: write.lock is not removed
Hi, Thanks very much for the explanation! Could we config to get to old behavior? I asked this option because our app has many small cores so that we prefer create/close writer on the fly (otherwise we may have memory issue quickly). We also do not need NRT for now. Thanks very much for helps, Lisheng -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, May 30, 2013 11:35 AM To: solr-user@lucene.apache.org Subject: Re: solr 4.3: write.lock is not removed : I recently upgraded solr from 3.6.1 to 4.3, it works well, but I noticed that after finishing : indexing : : write.lock : : is NOT removed. Later if I index again it still works OK. Only after I shutdown Tomcat : then write.lock is removed. This behavior caused some problem like I could not use luke : to observe indexed data. IIRC, This was an intentional change. In older versions of Solr the IndexWRiter was only opened if/when updates needed to be made, but that made it impossible to safely take advantage of some internal optimizations related to NRT IndexReader reloading, so the logic was modified to always keep the IndexWriter open as lon as the SolrCore is loaded. In general, your past behavior of pointing luke at a live solr index could have also produced problems if updates came into solr while luke had the write lock active. -Hoss
RE: solr starting time takes too long
Hi Eric, Thanks very much for helps (I should have responded sooner): 1/ My problem in 3.6 turned out to be much related to the fact I did not share schema, after using shareSchema, the start time is reduced up to 80% (to my great surprise, previously I thought burden is most in solrconfig). 2/ I just upgraded to solr 4.3, but somehow I did not see all the fixes mentioned in the WIKI (like shareConfig), I saw the resolution is Won't fix, do you have plan to put the fix into next release? Thanks and best regards, Lisheng -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, May 22, 2013 4:57 AM To: solr-user@lucene.apache.org Subject: Re: solr starting time takes too long Zhang: In 3.6, there's really no choice except to load all the cores on startup. 10 minutes still seems excessive, do you perhaps have a heavy-weight firstSearcher query? Yes, soft commits are 4.x only, so that's not your problem. There's a shareSchema option that tries to only load 1 copy of the schema that should help, but that doesn't help with loading solrconfig.xml. Also in the 4.3+ world there's the option to lazily-load cores, see: http://wiki.apache.org/solr/LotsOfCores for the overview. Perhaps not an option, but I thought I'd mention it. But I'm afraid you're stuck. You might be able to run bigger hardware (perhaps you're memory-starved). Other than that, you may need to use more than one machine to get fast enough startup times. Best, Erick On Wed, May 22, 2013 at 3:27 AM, Zhang, Lisheng lisheng.zh...@broadvision.com wrote: Thanks very much for quick helps! I searched but it seems that autoSoftCommit is solr 4x feature and we are still using 3.6.1? Best regards, Lisheng -Original Message- From: Carlos Bonilla [mailto:carlosbonill...@gmail.com] Sent: Wednesday, May 22, 2013 12:17 AM To: solr-user@lucene.apache.org Subject: Re: solr starting time takes too long Hi Lisheng, I had the same problem when I enabled the autoSoftCommit in solrconfig.xml. If you have it enabled, disabling it could fix your problem, Cheers. Carlos. 2013/5/22 Zhang, Lisheng lisheng.zh...@broadvision.com Hi, We are using solr 3.6.1, our application has many cores (more than 1K), the problem is that solr starting took a long time (10m). Examing log file and code we found that for each core we loaded many resources, but in our app, we are sure we are always using the same solrconfig.xml and schema.xml for all cores. While we can config schema.xml to be shared, we cannot share SolrConfig object. But looking inside SolrConfig code, we donot use any of the cache. Could we somehow change config (or source code) to share resource between cores to reduce solr starting time? Thanks very much for helps, Lisheng
Re: OPENNLP problems
I will look at these problems. Thanks for trying it out! Lance Norskog On 05/28/2013 10:08 PM, Patrick Mi wrote: Hi there, Checked out branch_4x and applied the latest patch LUCENE-2899-current.patch however I ran into 2 problems Followed the wiki page instruction and set up a field with this type aiming to keep nouns and verbs and do a facet on the field == fieldType name=text_opennlp_nvf class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.OpenNLPTokenizerFactory tokenizerModel=opennlp/en-token.bin/ filter class=solr.OpenNLPFilterFactory posTaggerModel=opennlp/en-pos-maxent.bin/ filter class=solr.FilterPayloadsFilterFactory payloadList=NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW/ filter class=solr.StripPayloadsFilterFactory/ /analyzer /fieldType == Struggled to get that going until I put the extra parameter keepPayloads=true in as below. filter class=solr.FilterPayloadsFilterFactory keepPayloads=true payloadList=NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW/ Question: am I doing the right thing? Is this a mistake on wiki Second problem: Posted the document xml one by one to the solr and the result was what I expected. add doc field name=id1/field field name=text_opennlp_nvfcheck in the hotel/field/doc /add However if I put multiple documents into the same xml file and post it in one go only the first document gets processed( only 'check' and 'hotel' were showing in the facet result.) add doc field name=id1/field field name=text_opennlp_nvfcheck in the hotel/field /doc doc field name=id2/field field name=text_opennlp_nvfremoves the payloads/field /doc doc field name=id3/field field name=text_opennlp_nvfretains only nouns and verbs /field /doc /add Same problem when updated the data using csv upload. Is that a bug or something I did wrong? Thanks in advance! Regards, Patrick
RE: solr 4.3: write.lock is not removed
I did more test and it seems that this is still a bug (previous issue 3/): 1/ Create a core by CURL command with dataDir=some_folder, core is created OK and later indexing worked OK also. 2/ But in solr.xml, dadaDir is not defined in element core 3/ After restart solr, dataDir information is lost and solr issued WARN. 4/ If I manually add dataDir attribute into core element in solr.xml after core is created, restarting solr will be fine. Thanks very much for helps, Lisheng -Original Message- From: Zhang, Lisheng Sent: Thursday, May 30, 2013 11:22 AM To: 'solr-user@lucene.apache.org' Subject: RE: solr 4.3: write.lock is not removed I did more tests and get more info: the basic setting is that we created core from PHP CURl API where we define: schema config instanceDir=my_solr_home dataDir=my_solr_home/data/new_collection_name In solr 3.6.1 we donot need to define schema/config because conf folder is not inside each collection. 1/ Indexing works OK but write.lock is not removed (we use /update?commit=true..) 2/ Shutdown tomcat, I saw write.lock is gone 3/ Restart Tomcat, indexed data was created at the instanceDir/data level, with some warning messages. It seems that in solr.xml, dataDir is not defined? Thanks very much for helps, Lisheng -Original Message- From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] Sent: Thursday, May 30, 2013 10:57 AM To: solr-user@lucene.apache.org Subject: RE: solr 4.3: write.lock is not removed Hi, We just use CURL from PHP code to submit indexing request, like: /update?commit=true.. This worked well in solr 3.6.1. I saw the link you showed and really appreciate (if no other choice I will change java source code but hope there is a better way..)? Thanks very much for helps, Lisheng -Original Message- From: bbarani [mailto:bbar...@gmail.com] Sent: Thursday, May 30, 2013 9:45 AM To: solr-user@lucene.apache.org Subject: Re: solr 4.3: write.lock is not removed How are you indexing the documents? Are you using indexing program? The below post discusses the same issue.. http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html -- View this message in context: http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html Sent from the Solr - User mailing list archive at Nabble.com.
Strip HTML Tags and Store
Hi AllI am trying to understand what gets stored when i configure a field indexed and stored for example i have this in my schema.xmlfield name=articleBody type=text_general indexed=true stored=true /and fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I was expecting that solr will index store html strip content when i invoke query i get some thing like this str name=articleBodyxhtml:h1xhtml:bSouth African Miners Are Trapped by Debt/xhtml:b/xhtml:h1 xhtml:pxhtml:b▸ A surge in high-interest lending contributes to mine violence/xhtml:b/xhtml:p xhtml:pxhtml:b▸ At least one bank “may have reckless lending problems”/xhtml:b/xhtml:p xhtml:pIn 2008, platinum miner James Ntseane borrowed 8,000 rand ($886) from xhtml:bAfrican Bank Investments/xhtml:b to pay for his grandmother's funeral. Soon after, he took out two more loans, totaling 10,000 rand, for a sofa and house extension. Four years later he owes at least 30,515 rand, according to text messages he gets from African Bank, South Africa's biggest provider of unsecured loans. Under a court-ordered payment plan, his employer garnishes about 13 percent of his monthly 12,600-rand salary for the lender. He doesn't know how much interest he's paying. “They are taking too much money,” says Ntseane, 41./xhtml:p xhtml:pNtseane is one of more than 9 million South Africans mired in debt. African Bank, xhtml:bBayport Financial Services, Capitec Bank Holdings/xhtml:b, and other firms have led a boom in unsecured lending, charging interest as high as 80 percent a year, as is allowed there. Last year a series of strikes led to at least 46 deaths, the country's worst mining violence since the end of apartheid. “One of the contributing factors to all of these strikes has been this surge in unsecured lending,” says Mike Schussler, chief economist at the research group a href=http://economists.co.za/;Economists.co.za/a, echoing an October statement by Trade and Industry Minister Rob Davies./xhtml:p xhtml:pThe value of consumer loans not backed by assets such as homes rose 39 percent in the year through September, to 140 billion rand, reports the National Credit Regulator. The loans made up 10 percent of consumer credit on Sept. 30, up from 8 percent a year earlier. In November, South Africa's National Treasury and the Banking Association of South Africa agreed to review lending affordability rules, improve client education, and reduce wage garnishing after the number of people with bad credit rose to a record. Finance Minister Pravin Gordhan called the rise “worrying” a week earlier./xhtml:p xhtml:pGeorge Roussos, an executive for central support services at African Bank, says miner Ntseane borrowed more than he claims and took out a credit card. (The bank received permission from Ntseane, who denies the bank's figures, to discuss his account with xhtml:iBloomberg Businessweek/xhtml:i.) The bank says it stopped charging interest in 2011 and has no record of Ntseane making contact after he was injured in a home robbery in 2010. “The bank attempts to communicate clearly and transparently, employing multilingual consultants,” says Roussos./xhtml:p xhtml:pSouth African lenders have re sorted to court-ordered wage garnishing in more than 3 million active cases, according to the National Debt Mediation Association, a credit industry group that provides consumer debt counseling. Kem Westdyk, chief executive of xhtml:bSummit Garnishee Solutions/xhtml:b, which helps mining companies review bank requests, says at some companies up to 15 percent of workers have wages garnished; at one, more than a quarter of those cases involve African Bank. “They may have reckless lending problems,” says Westdyk, adding that some workers have five or six garnishee orders against them./xhtml:p xhtml:pNtseane says his loan agent didn't mention garnishment when she agreed to delay his loan payments. Although Davies and the country's credit regulator have pledged to clamp down on unsecured lending, Ntseane doesn't have high hopes. “I don't know when I will stop paying,” he says./xhtml:p xhtml:p prism:class=bylinexhtml:i—Franz Wild, Mike Cohen, and Renee Bonorchis/xhtml:i/xhtml:p xhtml:pxhtml:ixhtml:bThe bottom line/xhtml:b
Re: Reindexing strategy
On 5/30/2013 8:30 AM, Dotan Cohen wrote: On Wed, May 29, 2013 at 5:37 PM, Shawn Heisey s...@elyograg.org wrote: It's impossible for us to give you hard numbers. You'll have to experiment to know how fast you can reindex without killing your servers. A basic tenet for such experimentation, and something you hopefully already know: You'll want to get baseline measurements before you begin testing for comparison. Thanks. I wan't looking for hard numbers, but rather am looking for what are the signs of problems. I know to keep my eye on memory and CPU, but I have no idea how to check disk I/O, and I'm not sure how to determine even if that becomes saturated. On UNIX platforms, take a look at vmstat for basic I/O measurement, and iostat for more detailed stats. One coarse measurement is the number of blocked/waiting processes - usually this is due to I/O contention, and you will want to look at the paging and swapping numbers - you don't want any swapping at all. But the best single number to look at is overall disk activity, which is the I/O percentage utilized number Shaun was mentioning. -Mike
RE: Support for Mongolian language
What would be the steps if we want to use Mongolian or any other language that is not supported? -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, May 30, 2013 5:43 PM To: solr-user@lucene.apache.org Subject: Re: Support for Mongolian language No, there is not. -- Jack Krupansky -Original Message- From: Sagar Chaturvedi Sent: Thursday, May 30, 2013 3:03 AM To: solr-user@lucene.apache.org Subject: RE: Support for Mongolian language I have already checked this link. Could not find any hint about Mongolian language. Is there any plugin available for that? -Original Message- From: bbarani [mailto:bbar...@gmail.com] Sent: Thursday, May 30, 2013 2:04 AM To: solr-user@lucene.apache.org Subject: Re: Support for Mongolian language Check out.. wiki.apache.org/solr/LanguageAnalysis For some reason the above site takes long time to open.. -- View this message in context: http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html Sent from the Solr - User mailing list archive at Nabble.com. DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . --- DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Strip HTML Tags and Store
Update Request Processors to the rescue again. Namely, the HTML Strip Field Update processor: Add to your solrconfig: updateRequestProcessorChain name=html-strip-features processor class=solr.HTMLStripFieldUpdateProcessorFactory str name=fieldNamefeatures/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain See: http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html Index content: curl http://localhost:8983/solr/update?commit=trueupdate.chain=html-strip-features; \ -H 'Content-type:application/json' -d ' [{id: doc-1, title: lt;Hello Worldgt;, features: pThis is a atest/a line gt;., other_t: pOther btext/b/p, more_t: Some bmore itext/i./b The end}]' Results: id:doc-1, title:[lt;Hello Worldgt;], features:[\nThis is a test line .], other_t:pOther btext/b/p, more_t:Some bmore itext/i./b The end, That stripped the HTML only from the features field, and expanded the named character entity as well. Add multiple str for multiple fields, or use fieldRegex, or... some other options. See: http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html -- Jack Krupansky -Original Message- From: Kalyan Kuram Sent: Thursday, May 30, 2013 8:18 PM To: solr-user@lucene.apache.org Subject: Strip HTML Tags and Store Hi AllI am trying to understand what gets stored when i configure a field indexed and stored for example i have this in my schema.xmlfield name=articleBody type=text_general indexed=true stored=true /and fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I was expecting that solr will index store html strip content when i invoke query i get some thing like this str name=articleBodyxhtml:h1xhtml:bSouth African Miners Are Trapped by Debt/xhtml:b/xhtml:h1 xhtml:pxhtml:b▸ A surge in high-interest lending contributes to mine violence/xhtml:b/xhtml:p xhtml:pxhtml:b▸ At least one bank “may have reckless lending problems”/xhtml:b/xhtml:p xhtml:pIn 2008, platinum miner James Ntseane borrowed 8,000 rand ($886) from xhtml:bAfrican Bank Investments/xhtml:b to pay for his grandmother's funeral. Soon after, he took out two more loans, totaling 10,000 rand, for a sofa and house extension. Four years later he owes at least 30,515 rand, according to text messages he gets from African Bank, South Africa's biggest provider of unsecured loans. Under a court-ordered payment plan, his employer garnishes about 13 percent of his monthly 12,600-rand salary for the lender. He doesn't know how much interest he's paying. “They are taking too much money,” says Ntseane, 41./xhtml:p xhtml:pNtseane is one of more than 9 million South Africans mired in debt. African Bank, xhtml:bBayport Financial Services, Capitec Bank Holdings/xhtml:b, and other firms have led a boom in unsecured lending, charging interest as high as 80 percent a year, as is allowed there. Last year a series of strikes led to at least 46 deaths, the country's worst mining violence since the end of apartheid. “One of the contributing factors to all of these strikes has been this surge in unsecured lending,” says Mike Schussler, chief economist at the research group a href=http://economists.co.za/;Economists.co.za/a, echoing an October statement by Trade and Industry Minister Rob Davies./xhtml:p xhtml:pThe value of consumer loans not backed by assets such as homes rose 39 percent in the year through September, to 140 billion rand, reports the National Credit Regulator. The loans made up 10 percent of consumer credit on Sept. 30, up from 8 percent a year earlier. In November, South Africa's National Treasury and the Banking Association of South Africa agreed to review lending affordability rules, improve client education, and reduce wage garnishing after the number of people with bad credit rose to a record. Finance Minister Pravin Gordhan called the rise “worrying” a week earlier./xhtml:p xhtml:pGeorge Roussos, an executive for central support services at African Bank, says miner Ntseane borrowed more than he claims and took out a credit card. (The bank received permission from
Re: Support for Mongolian language
Well, you would need a tokenizer, probably a stemmer, a list of stop-words (to ignore). Is the original text in UTF8 or is it in some alternative encoding. A quick search showed that there is an academic paper where they are trying to work with Mongolian to get it into Lucene. It seems quite relevant and would be a great point to start: http://scholar.google.ca/scholar?cluster=15851397934729234574hl=enas_sdt=0,5 It also lists a lot of challenges that happened with other languages before UTF8 became the main standard (Russian and Ukranian come to mind). Hope it helps, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 30, 2013 at 10:49 PM, Sagar Chaturvedi sagar.chaturv...@nectechnologies.in wrote: What would be the steps if we want to use Mongolian or any other language that is not supported? -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, May 30, 2013 5:43 PM To: solr-user@lucene.apache.org Subject: Re: Support for Mongolian language No, there is not. -- Jack Krupansky -Original Message- From: Sagar Chaturvedi Sent: Thursday, May 30, 2013 3:03 AM To: solr-user@lucene.apache.org Subject: RE: Support for Mongolian language I have already checked this link. Could not find any hint about Mongolian language. Is there any plugin available for that? -Original Message- From: bbarani [mailto:bbar...@gmail.com] Sent: Thursday, May 30, 2013 2:04 AM To: solr-user@lucene.apache.org Subject: Re: Support for Mongolian language Check out.. wiki.apache.org/solr/LanguageAnalysis For some reason the above site takes long time to open.. -- View this message in context: http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html Sent from the Solr - User mailing list archive at Nabble.com. DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . --- DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
RE: Support for Mongolian language
Thanks Alexandre for the link. It was really helpful. The original text will be in UTF-8. -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Friday, May 31, 2013 8:41 AM To: solr-user@lucene.apache.org Subject: Re: Support for Mongolian language Well, you would need a tokenizer, probably a stemmer, a list of stop-words (to ignore). Is the original text in UTF8 or is it in some alternative encoding. A quick search showed that there is an academic paper where they are trying to work with Mongolian to get it into Lucene. It seems quite relevant and would be a great point to start: http://scholar.google.ca/scholar?cluster=15851397934729234574hl=enas_sdt=0,5 It also lists a lot of challenges that happened with other languages before UTF8 became the main standard (Russian and Ukranian come to mind). Hope it helps, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 30, 2013 at 10:49 PM, Sagar Chaturvedi sagar.chaturv...@nectechnologies.in wrote: What would be the steps if we want to use Mongolian or any other language that is not supported? -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, May 30, 2013 5:43 PM To: solr-user@lucene.apache.org Subject: Re: Support for Mongolian language No, there is not. -- Jack Krupansky -Original Message- From: Sagar Chaturvedi Sent: Thursday, May 30, 2013 3:03 AM To: solr-user@lucene.apache.org Subject: RE: Support for Mongolian language I have already checked this link. Could not find any hint about Mongolian language. Is there any plugin available for that? -Original Message- From: bbarani [mailto:bbar...@gmail.com] Sent: Thursday, May 30, 2013 2:04 AM To: solr-user@lucene.apache.org Subject: Re: Support for Mongolian language Check out.. wiki.apache.org/solr/LanguageAnalysis For some reason the above site takes long time to open.. -- View this message in context: http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp40 66871p4066874.html Sent from the Solr - User mailing list archive at Nabble.com. DISCLAIMER: -- - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . -- - DISCLAIMER: -- - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . -- - DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender
Re: Support for Mongolian language
Try using the text_general field type and see how reasonable or unreasonable the standard tokenizer is at identifying reasonable word breaks for some sample Mongolian text. Use the Solr Admin UI Analyzer page to see what the various term analysis filters output. -- Jack Krupansky -Original Message- From: Sagar Chaturvedi Sent: Thursday, May 30, 2013 10:49 PM To: solr-user@lucene.apache.org Subject: RE: Support for Mongolian language What would be the steps if we want to use Mongolian or any other language that is not supported? -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, May 30, 2013 5:43 PM To: solr-user@lucene.apache.org Subject: Re: Support for Mongolian language No, there is not. -- Jack Krupansky -Original Message- From: Sagar Chaturvedi Sent: Thursday, May 30, 2013 3:03 AM To: solr-user@lucene.apache.org Subject: RE: Support for Mongolian language I have already checked this link. Could not find any hint about Mongolian language. Is there any plugin available for that? -Original Message- From: bbarani [mailto:bbar...@gmail.com] Sent: Thursday, May 30, 2013 2:04 AM To: solr-user@lucene.apache.org Subject: Re: Support for Mongolian language Check out.. wiki.apache.org/solr/LanguageAnalysis For some reason the above site takes long time to open.. -- View this message in context: http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html Sent from the Solr - User mailing list archive at Nabble.com. DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . --- DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
RE: Support for Mongolian language
Hi, On solr admin UI, in a query I am trying to highlight some fields. I have set hl = true, given name of comma separated fields in hl.fl but fields are not getting highlighted. Any insights? Regards, Sagar DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Highlighting fields
Sorry for wrong subject. Corrected it. -Original Message- From: Sagar Chaturvedi [mailto:sagar.chaturv...@nectechnologies.in] Sent: Friday, May 31, 2013 11:25 AM To: solr-user@lucene.apache.org Subject: RE: Support for Mongolian language Hi, On solr admin UI, in a query I am trying to highlight some fields. I have set hl = true, given name of comma separated fields in hl.fl but fields are not getting highlighted. Any insights? Regards, Sagar DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . --- DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---