Re: Search in specific website
Hi again, In Nutch list, I was told to use url:example\.net AND content:some keyword and so I did. However, I get results from both my URLs. Why this behaviour? Regards, PS: I've re(crawl|index)ed my data. On 10/12/2012 05:07 PM, Otis Gospodnetic wrote: Hi Tolga, You'll get more help on the Nutch mailing list. I don't know the schema Nutch uses for Solr off the top of my head, so I can't tell you if maybe it uses site for a field or host or url or domain or ... Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Fri, Oct 12, 2012 at 2:30 AM, Tolga to...@ozses.net wrote: Hi, I use nutch to crawl my website and index to solr. However, how can I search for piece of content in a specific website? I use multiple URL's Regards,
Search in specific website
Hi, I use nutch to crawl my website and index to solr. However, how can I search for piece of content in a specific website? I use multiple URL's Regards,
Search in body
Hi, My previous schema didn't have the body defined as field, so I did and searched for body:Smyrna, and no results turned up. What am I doing wrong? Regards,
Re: Search in body
I had no idea I had to index again, thanks for the heads up. On 10/09/2012 02:58 PM, Rafał Kuć wrote: Hello! After altering your schema.xml have you indexed your documents again ? It would be nice to see how you schema.xml looks like and example of the data, because otherwise we can only guess.
Re: Search in body
I've just indexed again, and no luck. Below is my schema schema name=nutch version=1.4 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=url class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1/ /analyzer /fieldType /types fields field name=id type=string stored=true indexed=true/ !-- core fields -- field name=segment type=string stored=true indexed=false/ field name=digest type=string stored=true indexed=false/ field name=boost type=float stored=true indexed=false/ !-- fields for index-basic plugin -- field name=host type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ field name=content type=text stored=false indexed=true/ field name=title type=text stored=true indexed=true/ field name=text type=text stored=true indexed=true/ field name=body type=text stored=true indexed=true/ field name=cache type=string stored=true indexed=false/ field name=tstamp type=date stored=true indexed=false/ !-- fields for index-anchor plugin -- field name=anchor type=string stored=true indexed=true multiValued=true/ !-- fields for index-more plugin -- field name=type type=string stored=true indexed=true multiValued=true/ field name=contentLength type=long stored=true indexed=false/ field name=lastModified type=date stored=true indexed=false/ field name=date type=date stored=true indexed=true/ !-- fields for languageidentifier plugin -- field name=lang type=string stored=true indexed=true/ !-- fields for subcollection plugin -- field name=subcollection type=string stored=true indexed=true multiValued=true/ !-- fields for feed plugin (tag is also used by microformats-reltag)-- field name=author type=string stored=true indexed=true/ field name=tag type=string stored=true indexed=true multiValued=true/ field name=feed type=string stored=true indexed=true/ field name=publishedDate type=date stored=true indexed=true/ field name=updatedDate type=date stored=true indexed=true/ !-- fields for creativecommons plugin -- field name=cc type=string stored=true indexed=true multiValued=true/ /fields uniqueKeyid/uniqueKey defaultSearchFieldcontent/defaultSearchField solrQueryParser defaultOperator=OR/ /schema I don't know how to show you example data, my URL is http://www.sabanciuniv.edu Regards, On 10/09/2012 02:58 PM, Rafał Kuć wrote: Hello! After altering your schema.xml have you indexed your documents again ? It would be nice to see how you schema.xml looks like and example of the data, because otherwise we can only guess.
Re: Search in body
I was expecting to be able to search in the body, but apparently I don't need it according to Markus. Regards, On 10/09/2012 03:27 PM, Rafał Kuć wrote: Hello! I assume you've added the body field, but you don't populate it. As far as I remember Nutch don't fill the body field by default. What you are expecting to have in the body field ?
I don't understand
Hi, There are two servers with the same configuration. I crawl the same URL. One of them is giving the following error: Caused by: org.apache.solr.common.SolrException: ERROR: [doc=http://bilgisayarciniz.org/] multiple values encountered for non multiValued copy field text: bilgisayarciniz web hizmetleri I really fail to understand. Why does this happen? Regards, PS: Neither server has multiValued=true for title field.
Re: I don't understand
=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=url class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1/ /analyzer /fieldType /types fields field name=id type=string stored=true indexed=true/ !-- core fields -- field name=segment type=string stored=true indexed=false/ field name=digest type=string stored=true indexed=false/ field name=boost type=float stored=true indexed=false/ !-- fields for index-basic plugin -- field name=host type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ field name=content type=text stored=false indexed=true/ field name=title type=text stored=true indexed=true/ field name=cache type=string stored=true indexed=false/ field name=tstamp type=date stored=true indexed=false/ !-- fields for index-anchor plugin -- field name=anchor type=string stored=true indexed=true multiValued=true/ !-- fields for index-more plugin -- field name=type type=string stored=true indexed=true multiValued=true/ field name=contentLength type=long stored=true indexed=false/ field name=lastModified type=date stored=true indexed=false/ field name=date type=date stored=true indexed=true/ !-- fields for languageidentifier plugin -- field name=lang type=string stored=true indexed=true/ !-- fields for subcollection plugin -- field name=subcollection type=string stored=true indexed=true multiValued=true/ !-- fields for feed plugin (tag is also used by microformats-reltag)-- field name=author type=string stored=true indexed=true/ field name=tag type=string stored=true indexed=true multiValued=true/ field name=feed type=string stored=true indexed=true/ field name=text type=string stored=true indexed=true/ field name=publishedDate type=date stored=true indexed=true/ field name=updatedDate type=date stored=true indexed=true/ !-- fields for creativecommons plugin -- field name=cc type=string stored=true indexed=true multiValued=true/ copyField source=* dest=text indexed=true stored=true/ /fields uniqueKeyid/uniqueKey defaultSearchFieldcontent/defaultSearchField solrQueryParser defaultOperator=OR/ /schema These schemas mention Nutch because Nutch tutorial tells me to overwrite Solr's schema with its own. On 10/08/2012 01:33 PM, Jan Høydahl wrote: Hi, Please describe your environemnt better * How do you crawl, using which crawler? * To which RequestHandler do you send the docs? * Which version of Solr * Can you share your schema and other relevant config with us? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 8. okt. 2012 kl. 12:11 skrev Tolga to...@ozses.net: Hi, There are two servers with the same configuration. I crawl the same URL. One of them is giving the following error: Caused by: org.apache.solr.common.SolrException: ERROR: [doc=http://bilgisayarciniz.org/] multiple values encountered for non multiValued copy field text: bilgisayarciniz web hizmetleri I really fail to understand. Why does this happen? Regards, PS: Neither server has multiValued=true for title field.
Re: I don't understand
=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=url class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1/ /analyzer /fieldType /types fields field name=id type=string stored=true indexed=true/ !-- core fields -- field name=segment type=string stored=true indexed=false/ field name=digest type=string stored=true indexed=false/ field name=boost type=float stored=true indexed=false/ !-- fields for index-basic plugin -- field name=host type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ field name=content type=text stored=false indexed=true/ field name=title type=text stored=true indexed=true/ field name=cache type=string stored=true indexed=false/ field name=tstamp type=date stored=true indexed=false/ !-- fields for index-anchor plugin -- field name=anchor type=string stored=true indexed=true multiValued=true/ !-- fields for index-more plugin -- field name=type type=string stored=true indexed=true multiValued=true/ field name=contentLength type=long stored=true indexed=false/ field name=lastModified type=date stored=true indexed=false/ field name=date type=date stored=true indexed=true/ !-- fields for languageidentifier plugin -- field name=lang type=string stored=true indexed=true/ !-- fields for subcollection plugin -- field name=subcollection type=string stored=true indexed=true multiValued=true/ !-- fields for feed plugin (tag is also used by microformats-reltag)-- field name=author type=string stored=true indexed=true/ field name=tag type=string stored=true indexed=true multiValued=true/ field name=feed type=string stored=true indexed=true/ field name=text type=string stored=true indexed=true/ field name=publishedDate type=date stored=true indexed=true/ field name=updatedDate type=date stored=true indexed=true/ !-- fields for creativecommons plugin -- field name=cc type=string stored=true indexed=true multiValued=true/ copyField source=* dest=text indexed=true stored=true/ /fields uniqueKeyid/uniqueKey defaultSearchFieldcontent/defaultSearchField solrQueryParser defaultOperator=OR/ /schema These schemas mention Nutch because Nutch tutorial tells me to overwrite Solr's schema with its own. Regards, On 10/08/2012 01:33 PM, Jan Høydahl wrote: Hi, Please describe your environemnt better * How do you crawl, using which crawler? * To which RequestHandler do you send the docs? * Which version of Solr * Can you share your schema and other relevant config with us? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 8. okt. 2012 kl. 12:11 skrev Tolga to...@ozses.net: Hi, There are two servers with the same configuration. I crawl the same URL. One of them is giving the following error: Caused by: org.apache.solr.common.SolrException: ERROR: [doc=http://bilgisayarciniz.org/] multiple values encountered for non multiValued copy field text: bilgisayarciniz web hizmetleri I really fail to understand. Why does this happen? Regards, PS: Neither server has multiValued=true for title field.
Solr search
Hi, I installed Solr and Nutch on a server, crawled with Nutch, and searched at http://localhost:8983/solr/, to no avail. I mean it turns up no results. What to do? Regards,
Re: Solr search
Nope. Nutch says Adding x documents then Error adding title 'Sabancı University'. On 10/04/2012 03:59 PM, Otis Gospodnetic wrote: Hi Search for *:* to retrieve all docs. Got anything? Otis -- Performance Monitoring - http://sematext.com/spm On Oct 4, 2012 5:50 AM, Tolgato...@ozses.net wrote: Hi, I installed Solr and Nutch on a server, crawled with Nutch, and searched at http://localhost:8983/solr/, to no avail. I mean it turns up no results. What to do? Regards,
Re: Solr search
Here's the last 100 lines of my log: 2012-10-03 12:52:45,761 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:52:45,761 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:52:45,761 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:52:48,807 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:52:48,807 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:52:48,807 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:52:51,822 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:52:51,822 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:52:51,822 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:52:54,827 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:52:54,828 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:52:54,828 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:52:57,834 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:52:57,834 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:52:57,834 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:53:00,842 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:53:00,842 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:53:00,842 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:53:03,958 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:53:03,958 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:53:03,958 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:53:06,809 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:53:06,810 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:53:06,810 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:53:09,855 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:53:09,856 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:53:09,856 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:53:12,870 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:53:12,870 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:53:12,870 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:53:15,877 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:53:15,878 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:53:15,878 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:53:18,882 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:53:18,882 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:53:18,882 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:53:21,889 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:53:21,889 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:53:21,889 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:53:25,005 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:53:25,006 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:53:25,006 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:53:27,858 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:53:27,858 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-10-03 12:53:27,858 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-10-03 12:53:30,902 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-10-03 12:53:30,903 INFO anchor.AnchorIndexingFilter -
Re: Solr search
The word 'commit' exists both in logs of failed attempt and succeeded attempt on another server with another URL. On 10/05/2012 07:18 AM, Jack Krupansky wrote: I wonder if nutch added documents but failed before it sent a commit to Solr. Do you see the commit in the Solr log file? If Solr is still running, you could manually send a commit yourself. -- Jack Krupansky -Original Message- From: Tolga Sent: Friday, October 05, 2012 12:14 AM To: solr-user@lucene.apache.org Subject: Re: Solr search Nope. Nutch says Adding x documents then Error adding title 'Sabancı University'. On 10/04/2012 03:59 PM, Otis Gospodnetic wrote: Hi Search for *:* to retrieve all docs. Got anything? Otis -- Performance Monitoring - http://sematext.com/spm On Oct 4, 2012 5:50 AM, Tolgato...@ozses.net wrote: Hi, I installed Solr and Nutch on a server, crawled with Nutch, and searched at http://localhost:8983/solr/, to no avail. I mean it turns up no results. What to do? Regards,
Error while indexing with Nutch
Hi, I'm trying to crawl my website with Nutch, and I think Nutch completed properly. However, I got these errors when the results were being indexed. It is not providing information to my knowledge except Severe errors in the configuration. What is the problem? Or is there a tool to test my configuration? Thanks, java.io.IOException: Job failed! SolrDeleteDuplicates: starting at 2012-09-10 11:06:33 SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/ Exception in thread main java.io.IOException: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:200) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353) at org.apache.nutch.crawl.Crawl.run(Crawl.java:153) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:198) ... 16 more Caused by: org.apache.solr.common.SolrException: Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml - org.apache.solr.common.SolrException: Schema Parsing Failed: A pseudo attribute name is expected. at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:123) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:478) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mor Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml - org.apache.solr.common.SolrException: Schema Parsing Failed: A pseudo attribute name is expected. at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688) at
Fwd: Error while indexing with Nutch
Most probably I found out. I closed the XML tag with /. :S Thanks anyway, Original Message Subject:Error while indexing with Nutch Date: Mon, 10 Sep 2012 11:55:02 +0300 From: Tolga to...@ozses.net To: solr-user@lucene.apache.org Hi, I'm trying to crawl my website with Nutch, and I think Nutch completed properly. However, I got these errors when the results were being indexed. It is not providing information to my knowledge except Severe errors in the configuration. What is the problem? Or is there a tool to test my configuration? Thanks, java.io.IOException: Job failed! SolrDeleteDuplicates: starting at 2012-09-10 11:06:33 SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/ Exception in thread main java.io.IOException: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:200) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353) at org.apache.nutch.crawl.Crawl.run(Crawl.java:153) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:198) ... 16 more Caused by: org.apache.solr.common.SolrException: Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml - org.apache.solr.common.SolrException: Schema Parsing Failed: A pseudo attribute name is expected. at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:123) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:478) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mor Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml - org.apache.solr.common.SolrException: Schema Parsing
Start up errors
Hi, When I started Solr, I got the following errors. The same are at http://www.example.com:8983/solr SEVERE: Exception during parsing file: schema:org.xml.sax.SAXParseException: Open quote is expected for attribute {1} associated with an element type source. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:174) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:388) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1414) at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:807) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:460) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:277) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2756) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:647) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119) at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:232) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284) at org.apache.solr.core.Config.init(Config.java:159) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:418) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:123) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:478) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:194) at org.mortbay.start.Main.start(Main.java:534) at org.mortbay.start.Main.start(Main.java:441) at org.mortbay.start.Main.main(Main.java:119) 4/09/2012 1:14:29 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Schema Parsing Failed: Open quote is expected for attribute {1} associated with an element type source. at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:123) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:478) at
missing core name in path
Hi, I've started Solr as usual, and when I browsed to http://www.example.com:8983/solr/admin, I got HTTP ERROR 404 Problem accessing /solr/admin/index.jsp. Reason: missing core name in path Powered by Jetty:// Also, below are the lines I got when starting it: SEVERE: org.apache.solr.common.SolrException: Schema Parsing Failed: multiple points at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:123) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:478) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:194) at org.mortbay.start.Main.start(Main.java:534) at org.mortbay.start.Main.start(Main.java:441) at org.mortbay.start.Main.main(Main.java:119) Caused by: java.lang.NumberFormatException: multiple points at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1082) at java.lang.Float.parseFloat(Float.java:422) at org.apache.solr.core.Config.getFloat(Config.java:307) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:430) ... 31 more Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: user.dir=/usr/local/solr/SOLR/example Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() done Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init() Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init() done Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrUpdateServlet init INFO: SolrUpdateServlet.init() done 2012-08-16 13:43:03.105:INFO::Started SocketConnector@0.0.0.0:8983 2012-08-16 13:45:24.162:WARN::/solr/admin/ java.lang.IllegalStateException: STREAM at org.mortbay.jetty.Response.getWriter(Response.java:616) at org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:187) at org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:180) at org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:237) at org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:173) at org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:124) at org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:415) at
Re: missing core name in path
Sorry for the late reply. I didn't install it, our sysadmin did based on my tutorialised experience. The version is 3.6.1 On 08/16/2012 02:28 PM, Jack Krupansky wrote: Compare your current schema.xml to a previous known good copy (or to the original from the Solr example) and see what changes have occurred. Maybe you were viewing it in some editor and accidentally hit some keys that corrupted the format. And, tell us what release of Solr you are using. -- Jack Krupansky -Original Message- From: Muzaffer Tolga Özses Sent: Thursday, August 16, 2012 6:57 AM To: solr-user@lucene.apache.org Subject: missing core name in path Hi, I've started Solr as usual, and when I browsed to http://www.example.com:8983/solr/admin, I got HTTP ERROR 404 Problem accessing /solr/admin/index.jsp. Reason: missing core name in path Powered by Jetty:// Also, below are the lines I got when starting it: SEVERE: org.apache.solr.common.SolrException: Schema Parsing Failed: multiple points at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:123) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:478) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:194) at org.mortbay.start.Main.start(Main.java:534) at org.mortbay.start.Main.start(Main.java:441) at org.mortbay.start.Main.main(Main.java:119) Caused by: java.lang.NumberFormatException: multiple points at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1082) at java.lang.Float.parseFloat(Float.java:422) at org.apache.solr.core.Config.getFloat(Config.java:307) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:430) ... 31 more Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: user.dir=/usr/local/solr/SOLR/example Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() done Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init() Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init() done Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrUpdateServlet init INFO: SolrUpdateServlet.init() done 2012-08-16 13:43:03.105:INFO::Started SocketConnector@0.0.0.0:8983 2012-08-16 13:45:24.162:WARN::/solr/admin
Hightlighting and excerpt
Hi, Two separate things asked in one thread... I am crawling my websites with nutch. When I index them, I'd like to be able to highlight my keyword and display en excerpt containing that keyword. I found a solution with highlight, but what can I about excerpt? Thanks and regards,
Re: Hightlighting and excerpt
I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB was stressed? On 5/31/12 3:54 PM, Jack Krupansky wrote: Since highlighting, by definition, does highlight terms in excerpts (snippets or fragments from a text field), what else is it that you need? -- Jack Krupansky -Original Message- From: Tolga Sent: Thursday, May 31, 2012 4:55 AM To: solr-user@lucene.apache.org Subject: Hightlighting and excerpt Hi, Two separate things asked in one thread... I am crawling my websites with nutch. When I index them, I'd like to be able to highlight my keyword and display en excerpt containing that keyword. I found a solution with highlight, but what can I about excerpt? Thanks and regards,
Re: Hightlighting and excerpt
You mean http:///www.example.com:8983/solr/browse? It says unknown field 'cat' On 5/31/12 4:16 PM, Jack Krupansky wrote: Yes, that is what highlighting does - it extracts an excerpt and highlights search terms. You said you have highlighting working, so what else is it that you need? Try /browse in the Solr example. It does exactly what your example shows. So, what else is it that you are trying to do? Or if something isn't working, what specifically isn't working? -- Jack Krupansky -Original Message- From: Tolga Sent: Thursday, May 31, 2012 9:08 AM To: solr-user@lucene.apache.org Subject: Re: Hightlighting and excerpt I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB was stressed? On 5/31/12 3:54 PM, Jack Krupansky wrote: Since highlighting, by definition, does highlight terms in excerpts (snippets or fragments from a text field), what else is it that you need? -- Jack Krupansky -Original Message- From: Tolga Sent: Thursday, May 31, 2012 4:55 AM To: solr-user@lucene.apache.org Subject: Hightlighting and excerpt Hi, Two separate things asked in one thread... I am crawling my websites with nutch. When I index them, I'd like to be able to highlight my keyword and display en excerpt containing that keyword. I found a solution with highlight, but what can I about excerpt? Thanks and regards,
org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id
Hi, I am getting this error: [doc=null] missing required field: id request: http://localhost:8983/solr/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) 2012-05-21 11:44:29,953 ERROR solr.SolrIndexer - java.io.IOException: Job failed! I've got this entry in schema.xml: field name=id type=string stored=true indexed=true/ What to do? Regards,
Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id
How do I verify it exists? I've been crawling the same site and it wasn't giving an error on Thursday. Regards, On 5/21/12 1:20 PM, Michael Kuhlmann wrote: Am 21.05.2012 12:07, schrieb Tolga: Hi, I am getting this error: [doc=null] missing required field: id [...] I've got this entry in schema.xml: field name=id type=string stored=true indexed=true/ What to do? Simply make sure that every document you're sending to Solr contains this id field. I assume it's declared as your unique id field, so it's mandatory. Greetings, Kuli
Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id
Yes. On 5/21/12 1:49 PM, Michael Kuhlmann wrote: Am 21.05.2012 12:40, schrieb Tolga: How do I verify it exists? I've been crawling the same site and it wasn't giving an error on Thursday. It depends on what you're doing. Are you using nutch? -Kuli
UI
Hi, Can you recommend a good PHP UI to search? Is SolrPHPClient good?
Unknown field
Hi, Is there a way what fields to add to schema.xml prior to crawling with nutch, rather than crawling over and over again and fixing the fields one by one? Regards,
Search plain text
Hi, I have 96 documents added to index, and I would like to be able to search in them in plain text, without using complex search queries. How can I do that? Regards,
Re: Search plain text
My website is http://liseyazokulu.sabanciuniv.edu/ it has the word barınma in it, and I want to be able to search for that by just typing barınma in the admin interface. On 5/18/12 3:40 PM, Jack Krupansky wrote: Could you give us some examples of the kinds of search you want to do? Besides, keywords and quoted phrases? The dismax query parser may be good enough. -- Jack Krupansky -Original Message- From: Tolga Sent: Friday, May 18, 2012 6:27 AM To: solr-user@lucene.apache.org Subject: Search plain text Hi, I have 96 documents added to index, and I would like to be able to search in them in plain text, without using complex search queries. How can I do that? Regards,
copyField
Hi, I've put the line copyField=* dest=text stored=true indexed=true/ in my schema.xml and restarted Solr, crawled my website, and indexed (I've also committed but do I really have to commit?). But I still have to search with content:mykeyword at the admin interface. What do I have to do so that I can search only with mykeyword? Regards,
Re: copyField
I'll make sure to do that. Thanks myPhone'dan gönderdim 18 May 2012 tarihinde 17:40 saatinde, Jack Krupansky j...@basetechnology.com şunları yazdı: Did you also delete all existing documents from the index? Maybe your crawl did not re-index documents that were already in the index or that hadn't changed since the last crawl, leaving the old index data as it was before the change. -- Jack Krupansky -Original Message- From: Tolga Sent: Friday, May 18, 2012 9:54 AM To: solr-user@lucene.apache.org Subject: copyField Hi, I've put the line copyField=* dest=text stored=true indexed=true/ in my schema.xml and restarted Solr, crawled my website, and indexed (I've also committed but do I really have to commit?). But I still have to search with content:mykeyword at the admin interface. What do I have to do so that I can search only with mykeyword? Regards,
Re: copyField
Default field? I'm not sure but I think I do. Will have to look. myPhone'dan gönderdim 18 May 2012 tarihinde 18:11 saatinde, Yury Kats yuryk...@yahoo.com şunları yazdı: On 5/18/2012 9:54 AM, Tolga wrote: Hi, I've put the line copyField=* dest=text stored=true indexed=true/ in my schema.xml and restarted Solr, crawled my website, and indexed (I've also committed but do I really have to commit?). But I still have to search with content:mykeyword at the admin interface. What do I have to do so that I can search only with mykeyword? Do you have the default field defined?
Re: copyField
Oh this one. Yes I have it. myPhone'dan gönderdim 18 May 2012 tarihinde 23:14 saatinde, Yury Kats yuryk...@yahoo.com şunları yazdı: On 5/18/2012 4:02 PM, Tolga wrote: Default field? I'm not sure but I think I do. Will have to look. http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field
curl or nutch
Hi, I have been trying for a week. I really want to get a start, so what should I use? curl or nutch? I want to be able to index pdf, xml etc. and search within them as well. Regards,
Re: curl or nutch
Can nutch crawl/index files as well? On 5/16/12 12:29 PM, findbestopensource wrote: You could very well use Solr. It has support to index the PDF and XML files. If you want to index websites and search using page rank then choose Nutch. Regards Aditya www.findbestopensource.com On Wed, May 16, 2012 at 1:13 PM, Tolgato...@ozses.net wrote: Hi, I have been trying for a week. I really want to get a start, so what should I use? curl or nutch? I want to be able to index pdf, xml etc. and search within them as well. Regards,
Index an URL
Hi, I have a few questions, please bear with me: 1- I have a theory. nutch may be used to index to solr when we don't have access to URL's file system, while we can use curl when we do have access. Am I correct? 2- A tutorial I have been reading is talking about different levels of id. Is there such a thing (exid6, exid7 etc)? 3- When I use curl http://localhost:8983/solr/update/extract?literal.id=exid7commit=true; -F myfile=@serialized-form.html, I get ERROR: [doc=exid7] unknown field 'ignored_link'/pre. Is this something exid7 gives me? Where does this field ignored_link come from? Do I need to add all these fields to schema.xml in order not to get such error? What is the safest way? Regards,
Re: Fwd: Delete documents
That worked, thanks a lot Jack :) On 5/11/12 7:44 AM, Jack Krupansky wrote: Try using the actual id of the document rather than the shell substitution variable - if you're trying to delete one document. To delete all documents, use delete by query: deletequery*:*/query/delete See: http://wiki.apache.org/solr/FAQ#How_can_I_delete_all_documents_from_my_index.3F -- Jack Krupansky -Original Message- From: Tolga Sent: Friday, May 11, 2012 12:31 AM To: solr-user@lucene.apache.org Subject: Fwd: Delete documents Anyone at all? Original Message Subject: Delete documents Date: Thu, 10 May 2012 22:59:49 +0300 From: Tolga to...@ozses.net To: solr-user@lucene.apache.org Hi, I've been reading http://lucene.apache.org/solr/api/doc-files/tutorial.html and in the section Deleting Data, I've edited schema.xml to include a field named id, issued the command for f in *;java -Ddata=args -Dcommit=yes -jar post.jar deleteid$f/id/delete;done, went on to the stats page only to find no files were de-indexed. How can I do that? Regards,
Error messages
Hi, Apache servers are returning my post with the status messages HTML_FONT_SIZE_HUGE,HTML_MESSAGE,HTTP_ESCAPED_HOST,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,URI_HEX,WEIRD_PORT. I've tried clearing all formatting and a re-post, but the same thing occurred. What to do? Regards,
Delete documents
Hi, I've been reading http://lucene.apache.org/solr/api/doc-files/tutorial.html and in the section Deleting Data, I've edited schema.xml to include a field named id, issued the command for f in *;java -Ddata=args -Dcommit=no -jar post.jar deleteid$f/id/delete;done, went on to the stats page only to find no files were de-indexed. How can I do that? Regards,
Delete data
Sorry, commit=no should have been commit=yes in my previous post. Regards,
Fwd: Delete documents
Anyone at all? Original Message Subject:Delete documents Date: Thu, 10 May 2012 22:59:49 +0300 From: Tolga to...@ozses.net To: solr-user@lucene.apache.org Hi, I've been reading http://lucene.apache.org/solr/api/doc-files/tutorial.html and in the section Deleting Data, I've edited schema.xml to include a field named id, issued the command for f in *;java -Ddata=args -Dcommit=yes -jar post.jar deleteid$f/id/delete;done, went on to the stats page only to find no files were de-indexed. How can I do that? Regards,
Re: CLASSPATH
Otis, I've just subscribed to nutch mailing list, however it's a very low-volume one (at least that's what I came across), so can't I ask here? Regards, On 5/8/12 11:54 PM, Otis Gospodnetic wrote: Tolga - you should ask on the Nutch mailing list, not Solr one. :) Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Tolgato...@ozses.net To: solr-user@lucene.apache.org Sent: Tuesday, May 8, 2012 4:30 PM Subject: CLASSPATH Hi, Probably off-topic, but what directory should I export to CLASSPATH environment variable so that I can begin using nutch? Regards,
CLASSPATH
Hi, Probably off-topic, but what directory should I export to CLASSPATH environment variable so that I can begin using nutch? Regards,
PDF indexing
Hi, From what I have read, I think I have to use Tika (?) to index PDF, xls, doc, etc files. How do I start? Do I use mvn clean install in the source directory to get all the jar files to begin? Centos doesn't provide mvn, how do I build Tika after getting it from http://maven.apache.org ? Sorry for the noob questions, I'm just beginning.
Re: PDF indexing
On 05/07/2012 10:35 PM, Jack Krupansky wrote: Try SolrCell (ExtractingRequestHandler). See: http://wiki.apache.org/solr/ExtractingRequestHandler -- Jack Krupansky -Original Message- From: Tolga Sent: Monday, May 07, 2012 3:24 PM To: solr-user@lucene.apache.org Subject: PDF indexing Hi, From what I have read, I think I have to use Tika (?) to index PDF, xls, doc, etc files. How do I start? Do I use mvn clean install in the source directory to get all the jar files to begin? Centos doesn't provide mvn, how do I build Tika after getting it from http://maven.apache.org ? Sorry for the noob questions, I'm just beginning. Jack, Thank you very much, I've managed to index a pdf file after a few tries. With this curl syntax, would it be possible to index an xml file as well or do we need to use java -jar post.jar file.xml? Or let me put it this way, how is post.jar different than curl? Regards,
Re: Direct control over document position in search results
I looked at that, elevate is a way to boost particular documents based on query terms used. I was thinking in a more general sense... For instance, when google displays search results, the 4th result (typically) are news results, then you tube results come in at another fixed position or better... This is not based on query term, but appears to be based on a document type meta-data field. We can certainly create the meta-data in Solr, but I can't seem to figure out how to manipulate the search results to the extent I need. On 2/24/09 9:12 AM, Steven A Rowe sar...@syr.edu wrote: Hi Tolga, Here's a good place to start: http://wiki.apache.org/solr/QueryElevationComponent Steve On 2/23/2009 at 7:47 PM, Ercan, Tolga wrote: I was wondering if there was any facility to directly manipulate search results based on business criteria to place documents at a fixed position in those results. For example, when I issue a query, the first four results would be based on natural search relevancy, then the fifth result would be based on the most relevant document when doctype:video (if I had a doctype field of course), then results 6...* would resume natural search relevancy? Or perhaps a variation on this, if the document where doctype:video would appear at a fixed position or better... For example, if somebody searched for my widget video, there would be a relevant document at a higher position than #5...
Direct control over document position in search results
Hello, I was wondering if there was any facility to directly manipulate search results based on business criteria to place documents at a fixed position in those results. For example, when I issue a query, the first four results would be based on natural search relevancy, then the fifth result would be based on the most relevant document when doctype:video (if I had a doctype field of course), then results 6...* would resume natural search relevancy? Or perhaps a variation on this, if the document where doctype:video would appear at a fixed position or better... For example, if somebody searched for my widget video, there would be a relevant document at a higher position than #5... Thanks! ~t