Re: Search in specific website

2012-10-16 Thread Tolga

Hi again,

In Nutch list, I was told to use url:example\.net AND content:some 
keyword and so I did. However, I get results from both my URLs. Why 
this behaviour?


Regards,

PS: I've re(crawl|index)ed my data.

On 10/12/2012 05:07 PM, Otis Gospodnetic wrote:

Hi Tolga,

You'll get more help on the Nutch mailing list.  I don't know the
schema Nutch uses for Solr off the top of my head, so I can't tell you
if maybe it uses site for a field or host or url or domain or
...

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Fri, Oct 12, 2012 at 2:30 AM, Tolga to...@ozses.net wrote:

Hi,

I use nutch to crawl my website and index to solr. However, how can I search
for piece of content in a specific website? I use multiple URL's

Regards,




Search in specific website

2012-10-12 Thread Tolga

Hi,

I use nutch to crawl my website and index to solr. However, how can I 
search for piece of content in a specific website? I use multiple URL's


Regards,


Search in body

2012-10-09 Thread Tolga

Hi,

My previous schema didn't have the body defined as field, so I did and 
searched for body:Smyrna, and no results turned up. What am I doing wrong?


Regards,


Re: Search in body

2012-10-09 Thread Tolga

I had no idea I had to index again, thanks for the heads up.

On 10/09/2012 02:58 PM, Rafał Kuć wrote:

Hello!

After altering your schema.xml have you indexed your documents again ?

It would be nice to see how you schema.xml looks like and example of
the data, because otherwise we can only guess.





Re: Search in body

2012-10-09 Thread Tolga

I've just indexed again, and no luck.

Below is my schema

schema name=nutch version=1.4
types
fieldType name=string class=solr.StrField 
sortMissingLast=true

omitNorms=true/
fieldType name=long class=solr.TrieLongField precisionStep=0
omitNorms=true positionIncrementGap=0/
fieldType name=float class=solr.TrieFloatField 
precisionStep=0

omitNorms=true positionIncrementGap=0/
fieldType name=date class=solr.TrieDateField precisionStep=0
omitNorms=true positionIncrementGap=0/

fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType
fieldType name=url class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1/
/analyzer
/fieldType
/types
fields
field name=id type=string stored=true indexed=true/

!-- core fields --
field name=segment type=string stored=true indexed=false/
field name=digest type=string stored=true indexed=false/
field name=boost type=float stored=true indexed=false/

!-- fields for index-basic plugin --
field name=host type=string stored=false indexed=true/
field name=url type=url stored=true indexed=true
required=true/
field name=content type=text stored=false indexed=true/
field name=title type=text stored=true indexed=true/
field name=text type=text stored=true indexed=true/
field name=body type=text stored=true indexed=true/
field name=cache type=string stored=true indexed=false/
field name=tstamp type=date stored=true indexed=false/

!-- fields for index-anchor plugin --
field name=anchor type=string stored=true indexed=true
multiValued=true/

!-- fields for index-more plugin --
field name=type type=string stored=true indexed=true
multiValued=true/
field name=contentLength type=long stored=true
indexed=false/
field name=lastModified type=date stored=true
indexed=false/
field name=date type=date stored=true indexed=true/

!-- fields for languageidentifier plugin --
field name=lang type=string stored=true indexed=true/

!-- fields for subcollection plugin --
field name=subcollection type=string stored=true
indexed=true multiValued=true/

!-- fields for feed plugin (tag is also used by 
microformats-reltag)--

field name=author type=string stored=true indexed=true/
field name=tag type=string stored=true indexed=true 
multiValued=true/

field name=feed type=string stored=true indexed=true/
field name=publishedDate type=date stored=true
indexed=true/
field name=updatedDate type=date stored=true
indexed=true/

!-- fields for creativecommons plugin --
field name=cc type=string stored=true indexed=true
multiValued=true/
/fields
uniqueKeyid/uniqueKey
defaultSearchFieldcontent/defaultSearchField
solrQueryParser defaultOperator=OR/
/schema

I don't know how to show you example data, my URL is 
http://www.sabanciuniv.edu


Regards,

On 10/09/2012 02:58 PM, Rafał Kuć wrote:

Hello!

After altering your schema.xml have you indexed your documents again ?

It would be nice to see how you schema.xml looks like and example of
the data, because otherwise we can only guess.





Re: Search in body

2012-10-09 Thread Tolga
I was expecting to be able to search in the body, but apparently I don't 
need it according to Markus.


Regards,

On 10/09/2012 03:27 PM, Rafał Kuć wrote:

Hello!

I assume you've added the body field, but you don't populate it. As
far as I remember Nutch don't fill the body field by default. What
you are expecting to have in the body field ?





I don't understand

2012-10-08 Thread Tolga

Hi,

There are two servers with the same configuration. I crawl the same URL. 
One of them is giving the following error:


Caused by: org.apache.solr.common.SolrException: ERROR: 
[doc=http://bilgisayarciniz.org/] multiple values encountered for non 
multiValued copy field text: bilgisayarciniz web hizmetleri


I really fail to understand. Why does this happen?

Regards,

PS: Neither server has multiValued=true for title field.


Re: I don't understand

2012-10-08 Thread Tolga
=true words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType
fieldType name=url class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1/
/analyzer
/fieldType
/types
fields
field name=id type=string stored=true indexed=true/

!-- core fields --
field name=segment type=string stored=true indexed=false/
field name=digest type=string stored=true indexed=false/
field name=boost type=float stored=true indexed=false/

!-- fields for index-basic plugin --
field name=host type=string stored=false indexed=true/
field name=url type=url stored=true indexed=true
required=true/
field name=content type=text stored=false indexed=true/
field name=title type=text stored=true indexed=true/
field name=cache type=string stored=true indexed=false/
field name=tstamp type=date stored=true indexed=false/

!-- fields for index-anchor plugin --
field name=anchor type=string stored=true indexed=true
multiValued=true/

!-- fields for index-more plugin --
field name=type type=string stored=true indexed=true
multiValued=true/
field name=contentLength type=long stored=true
indexed=false/
field name=lastModified type=date stored=true
indexed=false/
field name=date type=date stored=true indexed=true/

!-- fields for languageidentifier plugin --
field name=lang type=string stored=true indexed=true/

!-- fields for subcollection plugin --
field name=subcollection type=string stored=true
indexed=true multiValued=true/

!-- fields for feed plugin (tag is also used by 
microformats-reltag)--

field name=author type=string stored=true indexed=true/
field name=tag type=string stored=true indexed=true 
multiValued=true/

field name=feed type=string stored=true indexed=true/
field name=text type=string stored=true indexed=true/
field name=publishedDate type=date stored=true
indexed=true/
field name=updatedDate type=date stored=true
indexed=true/

!-- fields for creativecommons plugin --
field name=cc type=string stored=true indexed=true
multiValued=true/
copyField source=* dest=text indexed=true stored=true/
/fields
uniqueKeyid/uniqueKey
defaultSearchFieldcontent/defaultSearchField
solrQueryParser defaultOperator=OR/
/schema

These schemas mention Nutch because Nutch tutorial tells me to overwrite 
Solr's schema with its own.



On 10/08/2012 01:33 PM, Jan Høydahl wrote:

Hi,

Please describe your environemnt better

* How do you crawl, using which crawler?
* To which RequestHandler do you send the docs?
* Which version of Solr
* Can you share your schema and other relevant config with us?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

8. okt. 2012 kl. 12:11 skrev Tolga to...@ozses.net:


Hi,

There are two servers with the same configuration. I crawl the same URL. One of 
them is giving the following error:

Caused by: org.apache.solr.common.SolrException: ERROR: 
[doc=http://bilgisayarciniz.org/] multiple values encountered for non 
multiValued copy field text: bilgisayarciniz web hizmetleri

I really fail to understand. Why does this happen?

Regards,

PS: Neither server has multiValued=true for title field.




Re: I don't understand

2012-10-08 Thread Tolga
=true words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType
fieldType name=url class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1/
/analyzer
/fieldType
/types
fields
field name=id type=string stored=true indexed=true/

!-- core fields --
field name=segment type=string stored=true indexed=false/
field name=digest type=string stored=true indexed=false/
field name=boost type=float stored=true indexed=false/

!-- fields for index-basic plugin --
field name=host type=string stored=false indexed=true/
field name=url type=url stored=true indexed=true
required=true/
field name=content type=text stored=false indexed=true/
field name=title type=text stored=true indexed=true/
field name=cache type=string stored=true indexed=false/
field name=tstamp type=date stored=true indexed=false/

!-- fields for index-anchor plugin --
field name=anchor type=string stored=true indexed=true
multiValued=true/

!-- fields for index-more plugin --
field name=type type=string stored=true indexed=true
multiValued=true/
field name=contentLength type=long stored=true
indexed=false/
field name=lastModified type=date stored=true
indexed=false/
field name=date type=date stored=true indexed=true/

!-- fields for languageidentifier plugin --
field name=lang type=string stored=true indexed=true/

!-- fields for subcollection plugin --
field name=subcollection type=string stored=true
indexed=true multiValued=true/

!-- fields for feed plugin (tag is also used by 
microformats-reltag)--

field name=author type=string stored=true indexed=true/
field name=tag type=string stored=true indexed=true 
multiValued=true/

field name=feed type=string stored=true indexed=true/
field name=text type=string stored=true indexed=true/
field name=publishedDate type=date stored=true
indexed=true/
field name=updatedDate type=date stored=true
indexed=true/

!-- fields for creativecommons plugin --
field name=cc type=string stored=true indexed=true
multiValued=true/
copyField source=* dest=text indexed=true stored=true/
/fields
uniqueKeyid/uniqueKey
defaultSearchFieldcontent/defaultSearchField
solrQueryParser defaultOperator=OR/
/schema

These schemas mention Nutch because Nutch tutorial tells me to overwrite 
Solr's schema with its own.


Regards,

On 10/08/2012 01:33 PM, Jan Høydahl wrote:

Hi,

Please describe your environemnt better

* How do you crawl, using which crawler?
* To which RequestHandler do you send the docs?
* Which version of Solr
* Can you share your schema and other relevant config with us?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

8. okt. 2012 kl. 12:11 skrev Tolga to...@ozses.net:


Hi,

There are two servers with the same configuration. I crawl the same URL. One of 
them is giving the following error:

Caused by: org.apache.solr.common.SolrException: ERROR: 
[doc=http://bilgisayarciniz.org/] multiple values encountered for non 
multiValued copy field text: bilgisayarciniz web hizmetleri

I really fail to understand. Why does this happen?

Regards,

PS: Neither server has multiValued=true for title field.




Solr search

2012-10-04 Thread Tolga

Hi,

I installed Solr and Nutch on a server, crawled with Nutch, and searched 
at http://localhost:8983/solr/, to no avail. I mean it turns up no 
results. What to do?


Regards,


Re: Solr search

2012-10-04 Thread Tolga
Nope. Nutch says Adding x documents then Error adding title 'Sabancı 
University'.


On 10/04/2012 03:59 PM, Otis Gospodnetic wrote:

Hi

Search for *:* to retrieve all docs. Got anything?

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 4, 2012 5:50 AM, Tolgato...@ozses.net  wrote:


Hi,

I installed Solr and Nutch on a server, crawled with Nutch, and searched
at http://localhost:8983/solr/, to no avail. I mean it turns up no
results. What to do?

Regards,



Re: Solr search

2012-10-04 Thread Tolga

Here's the last 100 lines of my log:

2012-10-03 12:52:45,761 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:52:45,761 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:52:45,761 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:52:48,807 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:52:48,807 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:52:48,807 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:52:51,822 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:52:51,822 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:52:51,822 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:52:54,827 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:52:54,828 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:52:54,828 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:52:57,834 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:52:57,834 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:52:57,834 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:53:00,842 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:53:00,842 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:53:00,842 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:53:03,958 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:53:03,958 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:53:03,958 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:53:06,809 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:53:06,810 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:53:06,810 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:53:09,855 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:53:09,856 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:53:09,856 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:53:12,870 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:53:12,870 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:53:12,870 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:53:15,877 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:53:15,878 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:53:15,878 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:53:18,882 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:53:18,882 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:53:18,882 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:53:21,889 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:53:21,889 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:53:21,889 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:53:25,005 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:53:25,006 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:53:25,006 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:53:27,858 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:53:27,858 INFO anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2012-10-03 12:53:27,858 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-10-03 12:53:30,902 INFO indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-10-03 12:53:30,903 INFO anchor.AnchorIndexingFilter - 

Re: Solr search

2012-10-04 Thread Tolga
The word 'commit' exists both in logs of failed attempt and succeeded 
attempt on another server with another URL.


On 10/05/2012 07:18 AM, Jack Krupansky wrote:
I wonder if nutch added documents but failed before it sent a commit 
to Solr. Do you see the commit in the Solr log file? If Solr is still 
running, you could manually send a commit yourself.


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Friday, October 05, 2012 12:14 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr search

Nope. Nutch says Adding x documents then Error adding title 'Sabancı
University'.

On 10/04/2012 03:59 PM, Otis Gospodnetic wrote:

Hi

Search for *:* to retrieve all docs. Got anything?

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 4, 2012 5:50 AM, Tolgato...@ozses.net  wrote:


Hi,

I installed Solr and Nutch on a server, crawled with Nutch, and 
searched

at http://localhost:8983/solr/, to no avail. I mean it turns up no
results. What to do?

Regards,





Error while indexing with Nutch

2012-09-10 Thread Tolga

Hi,

I'm trying to crawl my website with Nutch, and I think Nutch completed 
properly. However, I got these errors when the results were being 
indexed. It is not providing information to my knowledge except Severe 
errors in the configuration. What is the problem? Or is there a tool to 
test my configuration?


Thanks,

java.io.IOException: Job failed!
SolrDeleteDuplicates: starting at 2012-09-10 11:06:33
SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/
Exception in thread main java.io.IOException: 
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:200)
at 
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)

at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)

at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)
at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)

at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
Caused by: org.apache.solr.client.solrj.SolrServerException: Error 
executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)

at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:198)

... 16 more
Caused by: org.apache.solr.common.SolrException: Severe errors in solr 
configuration.  Check your log files for more detailed information on 
what may be wrong.  If you want solr to continue after configuration 
errors, change: 
abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml 
- 
org.apache.solr.common.SolrException: Schema Parsing Failed: A pseudo 
attribute name is expected.  at 
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688)
 at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:123)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:478) 
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216)  at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161) 
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) 
at 
org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
 at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) 
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) 
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) 
 at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) 
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
 at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) 
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) 
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) 
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) 
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) 
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) 
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) 
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
 at org.mor


Severe errors in solr configuration.  Check your log files for more 
detailed information on what may be wrong.  If you want solr to continue 
after configuration errors, change: 
abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml 
- 
org.apache.solr.common.SolrException: Schema Parsing Failed: A pseudo 
attribute name is expected.  at 
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688)
 at 

Fwd: Error while indexing with Nutch

2012-09-10 Thread Tolga

Most probably I found out. I closed the XML tag with /. :S

Thanks anyway,


 Original Message 
Subject:Error while indexing with Nutch
Date:   Mon, 10 Sep 2012 11:55:02 +0300
From:   Tolga to...@ozses.net
To: solr-user@lucene.apache.org



Hi,

I'm trying to crawl my website with Nutch, and I think Nutch completed
properly. However, I got these errors when the results were being
indexed. It is not providing information to my knowledge except Severe
errors in the configuration. What is the problem? Or is there a tool to
test my configuration?

Thanks,

java.io.IOException: Job failed!
SolrDeleteDuplicates: starting at 2012-09-10 11:06:33
SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/
Exception in thread main java.io.IOException:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:200)
at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
Caused by: org.apache.solr.client.solrj.SolrServerException: Error
executing query
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:198)
... 16 more
Caused by: org.apache.solr.common.SolrException: Severe errors in solr
configuration.  Check your log files for more detailed information on
what may be wrong.  If you want solr to continue after configuration
errors, change:
abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml
-
org.apache.solr.common.SolrException: Schema Parsing Failed: A pseudo
attribute name is expected.  at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688)
 at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:123)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:478)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216)  at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)
at
org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
 at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
 at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
 at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
 at org.mor

Severe errors in solr configuration.  Check your log files for more
detailed information on what may be wrong.  If you want solr to continue
after configuration errors, change:
abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml
-
org.apache.solr.common.SolrException: Schema Parsing

Start up errors

2012-09-04 Thread Tolga

Hi,

When I started Solr, I got the following errors. The same are at 
http://www.example.com:8983/solr


SEVERE: Exception during parsing file: 
schema:org.xml.sax.SAXParseException: Open quote is expected for 
attribute {1} associated with an  element type  source.
at 
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195)
at 
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:174)
at 
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:388)
at 
com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1414)
at 
com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:807)
at 
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:460)
at 
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:277)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2756)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:647)
at 
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at 
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:232)
at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)

at org.apache.solr.core.Config.init(Config.java:159)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:418)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:123)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:478)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)

at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)

at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)

at org.mortbay.jetty.Server.doStart(Server.java:224)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)

at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:194)
at org.mortbay.start.Main.start(Main.java:534)
at org.mortbay.start.Main.start(Main.java:441)
at org.mortbay.start.Main.main(Main.java:119)

4/09/2012 1:14:29 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Schema Parsing Failed: 
Open quote is expected for attribute {1} associated with an element 
type  source.

at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:123)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:478)
at 

missing core name in path

2012-08-16 Thread Muzaffer Tolga Özses

Hi,

I've started Solr as usual, and when I browsed to 
http://www.example.com:8983/solr/admin, I got


HTTP ERROR 404

Problem accessing /solr/admin/index.jsp. Reason:

missing core name in path
Powered by Jetty://

Also, below are the lines I got when starting it:

SEVERE: org.apache.solr.common.SolrException: Schema Parsing Failed: 
multiple points

at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:123)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:478)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)

at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)

at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)

at org.mortbay.jetty.Server.doStart(Server.java:224)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)

at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:194)
at org.mortbay.start.Main.start(Main.java:534)
at org.mortbay.start.Main.start(Main.java:441)
at org.mortbay.start.Main.main(Main.java:119)
Caused by: java.lang.NumberFormatException: multiple points
at 
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1082)

at java.lang.Float.parseFloat(Float.java:422)
at org.apache.solr.core.Config.getFloat(Config.java:307)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:430)
... 31 more

Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: user.dir=/usr/local/solr/SOLR/example
Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init() done
Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrServlet init
INFO: SolrServlet.init()
Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader 
locateSolrHome

INFO: JNDI not configured for solr (NoInitialContextEx)
Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader 
locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or 
JNDI)

Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrServlet init
INFO: SolrServlet.init() done
Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader 
locateSolrHome

INFO: JNDI not configured for solr (NoInitialContextEx)
Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader 
locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or 
JNDI)

Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrUpdateServlet init
INFO: SolrUpdateServlet.init() done
2012-08-16 13:43:03.105:INFO::Started SocketConnector@0.0.0.0:8983
2012-08-16 13:45:24.162:WARN::/solr/admin/
java.lang.IllegalStateException: STREAM
at org.mortbay.jetty.Response.getWriter(Response.java:616)
at 
org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:187)
at 
org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:180)
at 
org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:237)
at 
org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:173)
at 
org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:124)
at 
org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:415)

at 

Re: missing core name in path

2012-08-16 Thread Muzaffer Tolga Özses

Sorry for the late reply.

I didn't install it, our sysadmin did based on my tutorialised 
experience. The version is 3.6.1

On 08/16/2012 02:28 PM, Jack Krupansky wrote:
Compare your current schema.xml to a previous known good copy (or to 
the original from the Solr example) and see what changes have 
occurred. Maybe you were viewing it in some editor and accidentally 
hit some keys that corrupted the format.


And, tell us what release of Solr you are using.

-- Jack Krupansky

-Original Message- From: Muzaffer Tolga Özses
Sent: Thursday, August 16, 2012 6:57 AM
To: solr-user@lucene.apache.org
Subject: missing core name in path

Hi,

I've started Solr as usual, and when I browsed to
http://www.example.com:8983/solr/admin, I got

HTTP ERROR 404

Problem accessing /solr/admin/index.jsp. Reason:

missing core name in path
Powered by Jetty://

Also, below are the lines I got when starting it:

SEVERE: org.apache.solr.common.SolrException: Schema Parsing Failed:
multiple points
at 
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688)

at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:123)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:478)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161) 


at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) 

at 
org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)

at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) 


at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) 


at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) 


at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) 


at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) 


at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:224)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 


at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 


at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:194)
at org.mortbay.start.Main.start(Main.java:534)
at org.mortbay.start.Main.start(Main.java:441)
at org.mortbay.start.Main.main(Main.java:119)
Caused by: java.lang.NumberFormatException: multiple points
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1082)
at java.lang.Float.parseFloat(Float.java:422)
at org.apache.solr.core.Config.getFloat(Config.java:307)
at 
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:430)

... 31 more

Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: user.dir=/usr/local/solr/SOLR/example
Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init() done
Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrServlet init
INFO: SolrServlet.init()
Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)
Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrServlet init
INFO: SolrServlet.init() done
Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
Aug 16, 2012 1:43:03 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)
Aug 16, 2012 1:43:03 PM org.apache.solr.servlet.SolrUpdateServlet init
INFO: SolrUpdateServlet.init() done
2012-08-16 13:43:03.105:INFO::Started SocketConnector@0.0.0.0:8983
2012-08-16 13:45:24.162:WARN::/solr/admin

Hightlighting and excerpt

2012-05-31 Thread Tolga

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be 
able to highlight my keyword and display en excerpt containing that 
keyword. I found a solution with highlight, but what can I about excerpt?


Thanks and regards,


Re: Hightlighting and excerpt

2012-05-31 Thread Tolga
I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB 
was stressed?


On 5/31/12 3:54 PM, Jack Krupansky wrote:
Since highlighting, by definition, does highlight terms in excerpts 
(snippets or fragments from a text field), what else is it that you need?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 4:55 AM
To: solr-user@lucene.apache.org
Subject: Hightlighting and excerpt

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight my keyword and display en excerpt containing that
keyword. I found a solution with highlight, but what can I about excerpt?

Thanks and regards,


Re: Hightlighting and excerpt

2012-05-31 Thread Tolga
You mean http:///www.example.com:8983/solr/browse? It says unknown 
field 'cat'


On 5/31/12 4:16 PM, Jack Krupansky wrote:
Yes, that is what highlighting does - it extracts an excerpt and 
highlights search terms. You said you have highlighting working, so 
what else is it that you need?


Try /browse in the Solr example. It does exactly what your example 
shows. So, what else is it that you are trying to do? Or if something 
isn't working, what specifically isn't working?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 9:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Hightlighting and excerpt

I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB
was stressed?

On 5/31/12 3:54 PM, Jack Krupansky wrote:
Since highlighting, by definition, does highlight terms in excerpts 
(snippets or fragments from a text field), what else is it that you 
need?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 4:55 AM
To: solr-user@lucene.apache.org
Subject: Hightlighting and excerpt

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight my keyword and display en excerpt containing that
keyword. I found a solution with highlight, but what can I about 
excerpt?


Thanks and regards, 




org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id

2012-05-21 Thread Tolga

Hi,

I am getting this error:

[doc=null] missing required field: id

request: http://localhost:8983/solr/update?wt=javabinversion=2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)

at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93)
at 
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2012-05-21 11:44:29,953 ERROR solr.SolrIndexer - java.io.IOException: 
Job failed!


I've got this entry in schema.xml: field name=id type=string 
stored=true indexed=true/

What to do?

Regards,


Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id

2012-05-21 Thread Tolga
How do I verify it exists? I've been crawling the same site and it 
wasn't giving an error on Thursday.


Regards,

On 5/21/12 1:20 PM, Michael Kuhlmann wrote:

Am 21.05.2012 12:07, schrieb Tolga:

Hi,

I am getting this error:

[doc=null] missing required field: id


[...]


I've got this entry in schema.xml: field name=id type=string
stored=true indexed=true/
What to do?


Simply make sure that every document you're sending to Solr contains 
this id field.


I assume it's declared as your unique id field, so it's mandatory.

Greetings,
Kuli



Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id

2012-05-21 Thread Tolga

Yes.

On 5/21/12 1:49 PM, Michael Kuhlmann wrote:

Am 21.05.2012 12:40, schrieb Tolga:

How do I verify it exists? I've been crawling the same site and it
wasn't giving an error on Thursday.


It depends on what you're doing.

Are you using nutch?

-Kuli


UI

2012-05-21 Thread Tolga

Hi,

Can you recommend a good PHP UI to search? Is SolrPHPClient good?


Unknown field

2012-05-18 Thread Tolga

Hi,

Is there a way what fields to add to schema.xml prior to crawling with 
nutch, rather than crawling over and over again and fixing the fields 
one by one?


Regards,


Search plain text

2012-05-18 Thread Tolga

Hi,

I have 96 documents added to index, and I would like to be able to 
search in them in plain text, without using complex search queries. How 
can I do that?


Regards,


Re: Search plain text

2012-05-18 Thread Tolga
My website is http://liseyazokulu.sabanciuniv.edu/ it has the word 
barınma in it, and I want to be able to search for that by just typing 
barınma in the admin interface.


On 5/18/12 3:40 PM, Jack Krupansky wrote:
Could you give us some examples of the kinds of search you want to do? 
Besides, keywords and quoted phrases?


The dismax query parser may be good enough.

-- Jack Krupansky

-Original Message- From: Tolga
Sent: Friday, May 18, 2012 6:27 AM
To: solr-user@lucene.apache.org
Subject: Search plain text

Hi,

I have 96 documents added to index, and I would like to be able to
search in them in plain text, without using complex search queries. How
can I do that?

Regards,


copyField

2012-05-18 Thread Tolga

Hi,

I've put the line copyField=* dest=text stored=true 
indexed=true/ in my schema.xml and restarted Solr, crawled my 
website, and indexed (I've also committed but do I really have to 
commit?). But I still have to search with content:mykeyword at the admin 
interface. What do I have to do so that I can search only with mykeyword?


Regards,


Re: copyField

2012-05-18 Thread Tolga
I'll make sure to do that. Thanks

 myPhone'dan gönderdim

18 May 2012 tarihinde 17:40 saatinde, Jack Krupansky 
j...@basetechnology.com şunları yazdı:

 Did you also delete all existing documents from the index? Maybe your crawl 
 did not re-index documents that were already in the index or that hadn't 
 changed since the last crawl, leaving the old index data as it was before the 
 change.
 
 -- Jack Krupansky
 
 -Original Message- From: Tolga
 Sent: Friday, May 18, 2012 9:54 AM
 To: solr-user@lucene.apache.org
 Subject: copyField
 
 Hi,
 
 I've put the line copyField=* dest=text stored=true
 indexed=true/ in my schema.xml and restarted Solr, crawled my
 website, and indexed (I've also committed but do I really have to
 commit?). But I still have to search with content:mykeyword at the admin
 interface. What do I have to do so that I can search only with mykeyword?
 
 Regards, 


Re: copyField

2012-05-18 Thread Tolga
Default field? I'm not sure but I think I do. Will have to look. 

 myPhone'dan gönderdim

18 May 2012 tarihinde 18:11 saatinde, Yury Kats yuryk...@yahoo.com şunları 
yazdı:

 On 5/18/2012 9:54 AM, Tolga wrote:
 Hi,
 
 I've put the line copyField=* dest=text stored=true 
 indexed=true/ in my schema.xml and restarted Solr, crawled my 
 website, and indexed (I've also committed but do I really have to 
 commit?). But I still have to search with content:mykeyword at the admin 
 interface. What do I have to do so that I can search only with mykeyword?
 
 Do you have the default field defined?
 


Re: copyField

2012-05-18 Thread Tolga
Oh this one. Yes I have it. 

 myPhone'dan gönderdim

18 May 2012 tarihinde 23:14 saatinde, Yury Kats yuryk...@yahoo.com şunları 
yazdı:

 On 5/18/2012 4:02 PM, Tolga wrote:
 Default field? I'm not sure but I think I do. Will have to look. 
 
 http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field


curl or nutch

2012-05-16 Thread Tolga

Hi,

I have been trying for a week. I really want to get a start, so what 
should I use? curl or nutch? I want to be able to index pdf, xml etc. 
and search within them as well.


Regards,


Re: curl or nutch

2012-05-16 Thread Tolga

Can nutch crawl/index files as well?

On 5/16/12 12:29 PM, findbestopensource wrote:

You could very well use Solr. It has support to index the PDF and XML
files. If you want to index websites and search using page rank then choose
Nutch.

Regards
Aditya
www.findbestopensource.com


On Wed, May 16, 2012 at 1:13 PM, Tolgato...@ozses.net  wrote:


Hi,

I have been trying for a week. I really want to get a start, so what
should I use? curl or nutch? I want to be able to index pdf, xml etc. and
search within them as well.

Regards,



Index an URL

2012-05-15 Thread Tolga

Hi,

I have a few questions, please bear with me:

1- I have a theory. nutch may be used to index to solr when we don't 
have access to URL's file system, while we can use curl when we do have 
access. Am I correct?
2- A tutorial I have been reading is talking about different levels of 
id. Is there such a thing (exid6, exid7 etc)?
3- When I use curl 
http://localhost:8983/solr/update/extract?literal.id=exid7commit=true; 
-F myfile=@serialized-form.html, I get ERROR: [doc=exid7] unknown 
field 'ignored_link'/pre. Is this something exid7 gives me? Where does 
this field ignored_link come from? Do I need to add all these fields to 
schema.xml in order not to get such error? What is the safest way?


Regards,


Re: Fwd: Delete documents

2012-05-11 Thread Tolga

That worked, thanks a lot Jack :)

On 5/11/12 7:44 AM, Jack Krupansky wrote:
Try using the actual id of the document rather than the shell 
substitution variable - if you're trying to delete one document.


To delete all documents, use delete by query:

deletequery*:*/query/delete

See:
http://wiki.apache.org/solr/FAQ#How_can_I_delete_all_documents_from_my_index.3F 



-- Jack Krupansky

-Original Message- From: Tolga
Sent: Friday, May 11, 2012 12:31 AM
To: solr-user@lucene.apache.org
Subject: Fwd: Delete documents

Anyone at all?

 Original Message 
Subject: Delete documents
Date: Thu, 10 May 2012 22:59:49 +0300
From: Tolga to...@ozses.net
To: solr-user@lucene.apache.org



Hi,
I've been reading
http://lucene.apache.org/solr/api/doc-files/tutorial.html and in the
section Deleting Data, I've edited schema.xml to include a field named
id, issued the command for f in *;java -Ddata=args -Dcommit=yes -jar
post.jar deleteid$f/id/delete;done, went on to the stats page
only to find no files were de-indexed. How can I do that?

Regards,



Error messages

2012-05-10 Thread Tolga

Hi,

Apache servers are returning my post with the status messages
HTML_FONT_SIZE_HUGE,HTML_MESSAGE,HTTP_ESCAPED_HOST,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,URI_HEX,WEIRD_PORT. 
I've tried clearing all formatting and a re-post, but the same thing 
occurred. What to do?


Regards,


Delete documents

2012-05-10 Thread Tolga

Hi,
I've been reading 
http://lucene.apache.org/solr/api/doc-files/tutorial.html and in the 
section Deleting Data, I've edited schema.xml to include a field named 
id, issued the command for f in *;java -Ddata=args -Dcommit=no -jar 
post.jar deleteid$f/id/delete;done, went on to the stats page 
only to find no files were de-indexed. How can I do that?


Regards,


Delete data

2012-05-10 Thread Tolga

Sorry, commit=no should have been commit=yes in my previous post.

Regards,


Fwd: Delete documents

2012-05-10 Thread Tolga

Anyone at all?

 Original Message 
Subject:Delete documents
Date:   Thu, 10 May 2012 22:59:49 +0300
From:   Tolga to...@ozses.net
To: solr-user@lucene.apache.org



Hi,
I've been reading
http://lucene.apache.org/solr/api/doc-files/tutorial.html and in the
section Deleting Data, I've edited schema.xml to include a field named
id, issued the command for f in *;java -Ddata=args -Dcommit=yes -jar
post.jar deleteid$f/id/delete;done, went on to the stats page
only to find no files were de-indexed. How can I do that?

Regards,



Re: CLASSPATH

2012-05-09 Thread Tolga

Otis,

I've just subscribed to nutch mailing list, however it's a very 
low-volume one (at least that's what I came across), so can't I ask here?


Regards,

On 5/8/12 11:54 PM, Otis Gospodnetic wrote:

Tolga - you should ask on the Nutch mailing list, not Solr one. :)

Otis 


Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm





From: Tolgato...@ozses.net
To: solr-user@lucene.apache.org
Sent: Tuesday, May 8, 2012 4:30 PM
Subject: CLASSPATH

Hi,

Probably off-topic, but what directory should I export to CLASSPATH environment 
variable so that I can begin using nutch?

Regards,





CLASSPATH

2012-05-08 Thread Tolga

Hi,

Probably off-topic, but what directory should I export to CLASSPATH 
environment variable so that I can begin using nutch?


Regards,


PDF indexing

2012-05-07 Thread Tolga

Hi,

From what I have read, I think I have to use Tika (?) to index PDF, 
xls, doc, etc files. How do I start? Do I use mvn clean install in the 
source directory to get all the jar files to begin? Centos doesn't 
provide mvn, how do I build Tika after getting it from 
http://maven.apache.org ?


Sorry for the noob questions, I'm just beginning.


Re: PDF indexing

2012-05-07 Thread Tolga

On 05/07/2012 10:35 PM, Jack Krupansky wrote:

Try SolrCell (ExtractingRequestHandler).

See:
http://wiki.apache.org/solr/ExtractingRequestHandler

-- Jack Krupansky

-Original Message- From: Tolga Sent: Monday, May 07, 2012 3:24 
PM To: solr-user@lucene.apache.org Subject: PDF indexing

Hi,

From what I have read, I think I have to use Tika (?) to index PDF, 
xls, doc, etc files. How do I start? Do I use mvn clean install in the 
source directory to get all the jar files to begin? Centos doesn't 
provide mvn, how do I build Tika after getting it from 
http://maven.apache.org ?


Sorry for the noob questions, I'm just beginning.

Jack,

Thank you very much, I've managed to index a pdf file after a few tries. 
With this curl syntax, would it be possible to index an xml file as well 
or do we need to use java -jar post.jar file.xml? Or let me put it this 
way, how is post.jar different than curl?


Regards,


Re: Direct control over document position in search results

2009-02-25 Thread Ercan, Tolga
I looked at that, elevate is a way to boost particular documents based on query 
terms used. I was thinking in a more general sense... For instance, when google 
displays search results, the 4th result (typically) are news results, then you 
tube results come in at another fixed position or better... This is not based 
on query term, but appears to be based on a document type meta-data field. We 
can certainly create the meta-data in Solr, but I can't seem to figure out how 
to manipulate the search results to the extent I need.


On 2/24/09 9:12 AM, Steven A Rowe sar...@syr.edu wrote:

Hi Tolga,

Here's a good place to start:

http://wiki.apache.org/solr/QueryElevationComponent

Steve

On 2/23/2009 at 7:47 PM, Ercan, Tolga wrote:
 I was wondering if there was any facility to directly manipulate
search
 results based on business criteria to place documents at a fixed
 position in those results. For example, when I issue a query, the
first
 four results would be based on natural search relevancy, then the
fifth
 result would be based on the most relevant document when doctype:video
 (if I had a doctype field of course), then results 6...* would resume
 natural search relevancy?

 Or perhaps a variation on this, if the document where doctype:video
 would appear at a fixed position or better... For example, if somebody
 searched for my widget video, there would be a relevant document at
a
 higher position than #5...



Direct control over document position in search results

2009-02-23 Thread Ercan, Tolga
Hello,

I was wondering if there was any facility to directly manipulate search results 
based on business criteria to place documents at a fixed position in those 
results. For example, when I issue a query, the first four results would be 
based on natural search relevancy, then the fifth result would be based on the 
most relevant document when doctype:video (if I had a doctype field of course), 
then results 6...* would resume natural search relevancy?

Or perhaps a variation on this, if the document where doctype:video would 
appear at a fixed position or better... For example, if somebody searched for 
my widget video, there would be a relevant document at a higher position than 
#5...

Thanks!
~t