Re: solr+jetty logging to syslog?

2009-11-24 Thread Otis Gospodnetic
Not many people do that, judging from 
http://www.google.com/search?&q=+solr%20+syslogd .

But I think this is really not a Solr-specific question.  Isn't the question 
really "how do I configure log4j to log to syslogd?".  Oh, and then "how do I 
configure slf4j to use log4j?"

The answer to the first one is "by using SyslogAppender" (google says so)
The answer to the second one might be on 
http://fernandoribeiro.eti.br/2006/05/24/how-to-use-slf4j-with-log4j/
 
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Steve Conover 
> To: solr-user@lucene.apache.org
> Sent: Sat, November 21, 2009 4:09:57 PM
> Subject: Re: solr+jetty logging to syslog?
> 
> Does no one send solr logging to syslog?
> 
> On Thu, Nov 19, 2009 at 5:54 PM, Steve Conover wrote:
> > The solution involves slf4j to log4j to syslog (at least, for solr),
> > but I'm having some trouble stringing all the parts together.  If
> > anyone is doing this, would you mind posting how you use slf4j-log4j
> > jar, what your log4j.properties looks like, what your java system
> > properties settings are, and anything else you think is relevant?
> >
> > Much appreciated
> >
> > -Steve
> >



Re: Migrating to Solr

2009-11-24 Thread Otis Gospodnetic
Except http://sesat.no/ hasn't been reachable for about 2 days now  Google 
cache to the rescue!

Otis

- Original Message 

> From: Shashi Kant 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 24, 2009 10:05:30 AM
> Subject: Re: Migrating to Solr
> 
> Here is a link that might be helpful:
> 
> http://sesat.no/moving-from-fast-to-solr-review.html
> 
> The site is choc-a-bloc with great information on their migration
> experience.
> 
> 
> On Tue, Nov 24, 2009 at 8:55 AM, Tommy Molto wrote:
> 
> > Hi,
> >
> > I'm new at Solr and i need to make a "test pilot" of a migration from Fast
> > ESP to Apache Solr, anyone had this experience before?
> >
> >
> > Att,
> >



Re: [SolrResourceLoader] Unable to load cached class-name

2009-11-24 Thread Otis Gospodnetic
Oh, and regarding the log4j Solr appender, could you please contribute it to 
log4j? http://logging.apache.org/log4j/1.2/index.html

That way it will get more user exposure and developer/maintenance love.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Stuart Grimshaw 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 24, 2009 4:39:37 PM
> Subject: [SolrResourceLoader] Unable to load cached class-name
> 
> Bit of a long error message, so I won't post it all in the subject :-)
> 
> I'm trying to create a log4j solr appender to help us track down log
> entries from across our jboss cluster, I might be able to make use of
> the faceted search to identify errors that occur more often and things
> like that.
> 
> Anyway, on to my problem, you can see the source on github
> http://github.com/Stubbs/solrIndexAppender
> 
> I've deployed the contents of dist/ into JBoss's lib directory for the
> server I'm running and I've also copied the contents of lib/ into
> there as well. I've also copied the solrj libs into there too, but I
> get the following error:
> 
> [SolrResourceLoader] Unable to load cached class-name :
> org.apache.solr.search.FastLRUCache for shortname :
> solr.FastLRUCachejava.lang.ClassNotFoundException:
> org.apache.solr.search.FastLRUCache
> 
> I've seen posts that suggest this is because of usuing 1.3 libs, but
> the only 1.3 libs I have are in my maven repo and are not deployed.
> 
> -S
> 
> Follow me on Twitter: http://twitter.com/stubbs
> Blog: http://stubblog.wordpress.com
> My art: http://stuartgrimshaw.imagekind.com
> Stock Images: http://en.fotolia.com/partner/16775



Re: [SolrResourceLoader] Unable to load cached class-name

2009-11-24 Thread Otis Gospodnetic
Hi Stuart,

I don't understand your last paragraph, but yes, that class is not in Solr 1.3. 
 It is in Solr 1.4 and Solr 1.4 is available in Apache maven repo.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Stuart Grimshaw 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 24, 2009 4:39:37 PM
> Subject: [SolrResourceLoader] Unable to load cached class-name
> 
> Bit of a long error message, so I won't post it all in the subject :-)
> 
> I'm trying to create a log4j solr appender to help us track down log
> entries from across our jboss cluster, I might be able to make use of
> the faceted search to identify errors that occur more often and things
> like that.
> 
> Anyway, on to my problem, you can see the source on github
> http://github.com/Stubbs/solrIndexAppender
> 
> I've deployed the contents of dist/ into JBoss's lib directory for the
> server I'm running and I've also copied the contents of lib/ into
> there as well. I've also copied the solrj libs into there too, but I
> get the following error:
> 
> [SolrResourceLoader] Unable to load cached class-name :
> org.apache.solr.search.FastLRUCache for shortname :
> solr.FastLRUCachejava.lang.ClassNotFoundException:
> org.apache.solr.search.FastLRUCache
> 
> I've seen posts that suggest this is because of usuing 1.3 libs, but
> the only 1.3 libs I have are in my maven repo and are not deployed.
> 
> -S
> 
> Follow me on Twitter: http://twitter.com/stubbs
> Blog: http://stubblog.wordpress.com
> My art: http://stuartgrimshaw.imagekind.com
> Stock Images: http://en.fotolia.com/partner/16775



Re: Deduplication in 1.4

2009-11-24 Thread Otis Gospodnetic
Hi,

As far as I know, the point of deduplication in Solr ( 
http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate document 
before indexing it in order to avoid duplicates in the index in the first place.

What you are describing is closer to field collapsing patch in SOLR-236.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: KaktuChakarabati 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 24, 2009 5:29:00 PM
> Subject: Deduplication in 1.4
> 
> 
> Hey,
> I've been trying to find some documentation on using this feature in 1.4 but
> Wiki page is alittle sparse..
> In specific, here's what i'm trying to do:
> 
> I have a field, say 'duplicate_group_id' that i'll populate based on some
> offline documents deduplication process I have.
> 
> All I want is for solr to compute a 'duplicate_signature' field based on
> this one at update time, so that when i search for documents later, all
> documents with same original 'duplicate_group_id' value will be rolled up
> (e.g i'll just get the first one that came back  according to relevancy).
> 
> I enabled the deduplication processor and put it into updater, but i'm not
> seeing any difference in returned results (i.e results with same
> duplicate_id are returned separately..)
> 
> is there anything i need to supply in query-time for this to take effect?
> what should be the behaviour? is there any working example of this?
> 
> Anything will be helpful..
> 
> Thanks,
> Chak
> -- 
> View this message in context: 
> http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: how to do partial word searches?

2009-11-24 Thread Erick Erickson
copying from Eric Hatcher:

See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently
does not have leading wildcard support enabled.

There's a pretty extensive recent exchange on this, see the
thread on the user's list titled

"leading and trailing wildcard query"Best
Erick

On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund  wrote:

> Hi, I saw some older postings on this, but didnt see a resolution.
>
> I have a field called title, I would like to be able to find partial word
> matches within the title.
>
> For example:
>
> http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22
>
> I would expect it to find:
> the daily dish | by andrew sullivan
>
> but it doesnt, it does find sully (which is fine with me also as a bonus),
> but doesnt seem to get any of the partial word stuff. Oddly enough before I
> lowercased the title, the wildcard matching seemed to work a bit better, it
> just didnt deal with the case sensitive query.
>
> At first I had mixed case titles and I read that the wildcard doesn't work
> with mixed case, so I created another field that is a lowered version of the
> title called "textTitle", it is of type text.
>
> Is it possible with solr to achieve what I am trying to do, if so how? If
> not, anything closer than what I have?
>
> thanks
> Joel
>
>


Re: Implementing phrase autopop up

2009-11-24 Thread darniz

can anybody update me if its possible that a word within a phrase is match,
that phrase can be displayed.

darniz

darniz wrote:
> 
> Thanks for your input
> You made a valid point, if we are using field type as text to get
> autocomplete it wont work because it goes through tokenizer.
> Hence looks like for my use case i need to have a field which uses ngram
> and copy. Here is what i did
> 
> i created a filed as same as the lucid blog says.
> 
>  omitNorms="true" omitTermFreqAndPositions="true"/>
> 
> with the following field configurtion
> 
>  positionIncrementGap="100">
> −
> 
> 
> 
>  maxGramSize="25"/>
> 
> −
> 
> 
> 
> 
> 
> 
> Now when i query i get the correct phrases for example if search for 
> autocomp:"how to" i get all the correct phrases like
> 
> How to find a car
> How to find a mechanic 
> How to choose the right insurance company
> 
> etc... which is good.
> 
> Now I have two question.
> 1) Is it necessary to give the query in quote. My gut feeling is yes,
> since  if you dont give quote i get phrases beginning with How followed by
> some other words like How can etc...
> 
> 2)if i search for word for example choose, it gives me nothing
> I was expecting to see a result considering there is a word "choose" in
> the phrase 
> How to choose the right insurance company
> 
> i might look more at documentation but do you have anything to advice.
> 
> darniz
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Shalin Shekhar Mangar wrote:
>> 
>> On Tue, Nov 24, 2009 at 10:12 AM, darniz  wrote:
>> 
>>>
>>> hello all
>>> Let me first explain the task i am trying to do.
>>> i have article with title for example
>>> 
>>> >Car Insurance for Teenage Drivers
>>> 
>>> −
>>> 
>>> A Total Loss? 
>>> 
>>> If a user begins to type car insu i want the autopop to show up with the
>>> entire phrase.
>>> There are two ways to implement this.
>>> First is to use the termcomponent and the other is to use a field with
>>> field
>>> type which uses solr.EdgeNGramFilterFactor filter.
>>>
>>> I started with using with Term component and i declared a term request
>>> handler and gave the following query
>>>
>>> http://localhost:8080/solr/terms?terms.fl=title&terms.prefix=car
>>> The issue is that its not giving the entire pharse, it gives me back
>>> results
>>> like car, caravan, carbon. Now  i know using terms.prefix will only give
>>> me
>>> results where the sentence start with car. On top of this i also want if
>>> there is word like car somewhere in between the title that should also
>>> show
>>> up in autopop very much similar like google where a word is not
>>> necessarily
>>> start at the beginning but it could be present anywhere in the middle of
>>> the
>>> title.
>>> The question is does TermComponent is a good candidate or  using a
>>> custom
>>> field lets the name is autoPopupText with field type configured with all
>>> filter and EdgeNGramFilterFactor defined and copying the title to the
>>> autoPopupText field and using it to power autopopup.
>>>
>>> The other thing is that using  EdgeNGramFilterFactor is more from index
>>> point of view when you index document you need to know which fields you
>>> want
>>> to copy to autoPopupText field where as using Term component is more
>>> like
>>> you can define at query time what fields you want to use to fetch
>>> autocomplete from.
>>>
>>> Any idea whats the best and why the Term component is not giving me an
>>> entire phrase which i mentioned earlier.
>>> FYI
>>> my title field is of type text.
>>>
>> 
>> 
>> You are using a tokenized field type with TermsComponent therefore each
>> word
>> in your phrase gets indexed as a separate token. You should use a
>> non-tokenized type (such as a string type) with TermsComponent. However,
>> this will only let you search by prefix and not by words in between the
>> phrase.
>> 
>> Your best bet here would be to use EdgeNGramFilterFactory. If your index
>> is
>> very large, you can consider doing a prefix search on shingles too.
>> 
>> -- 
>> Regards,
>> Shalin Shekhar Mangar.
>> 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Implementing-phrase-autopop-up-tp26490419p26506470.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: configure solr

2009-11-24 Thread Joel Nylund
for #1, under example, is there a webapps folder, does it contain  
solr.war ? are there any errors in your startup log for jetty, does it  
say anything about setting up solr, and solr home etc.


Joel

On Nov 24, 2009, at 4:55 PM, Jill Han wrote:


Hi,

I just downloaded solr -1.4.0 to my computer, C:\apache-solr-1.4.0.

1.I followed the instruction to run the sample, java -jar
start.jar at C:\apache-solr-1.4.0\example

And then go to http://localhost:8983/solr/admin, however, I got


HTTP ERROR: 404

   NOT_FOUND

RequestURI=/solr/admin

Powered by jetty:// 

Did I miss something?

2.   Since I can't get sample run, I tried to run it on tomcat
server(5.5) directly as

a.   Copy/paste apache-solr-1.4.0.war to C:\Tomcat 5.5\webapps,

b.   Go to http://localhost:8080/apache-solr-1.4.0/

The error message is" HTTP Status 500 - Severe errors in solr
configuration.."

3.   How to configure it on tomcat server?

Your help is appreciated very much as always,

Jill









how to do partial word searches?

2009-11-24 Thread Joel Nylund

Hi, I saw some older postings on this, but didnt see a resolution.

I have a field called title, I would like to be able to find partial  
word matches within the title.


For example:

http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22

I would expect it to find:
the daily dish | by andrew sullivan

but it doesnt, it does find sully (which is fine with me also as a  
bonus), but doesnt seem to get any of the partial word stuff. Oddly  
enough before I lowercased the title, the wildcard matching seemed to  
work a bit better, it just didnt deal with the case sensitive query.


At first I had mixed case titles and I read that the wildcard doesn't  
work with mixed case, so I created another field that is a lowered  
version of the title called "textTitle", it is of type text.


Is it possible with solr to achieve what I am trying to do, if so how?  
If not, anything closer than what I have?


thanks
Joel



Re: configure solr

2009-11-24 Thread Erick Erickson
For the second question, do the instructions here help?
http://wiki.apache.org/solr/SolrTomcat

I suspect your SOLR instance doesn't know where to find the SOLR
config files. So a severe error, indeed. It can't find them at all .

WARNING: I'm *really* not a tomcat expert, and the instructions
at the URL are for Tomcat 6x. But they might give you a clue if
you're reasonable tomcat-savvy.

HTH
Erick

On Tue, Nov 24, 2009 at 4:55 PM, Jill Han  wrote:

> Hi,
>
> I just downloaded solr -1.4.0 to my computer, C:\apache-solr-1.4.0.
>
> 1.I followed the instruction to run the sample, java -jar
> start.jar at C:\apache-solr-1.4.0\example
>
> And then go to http://localhost:8983/solr/admin, however, I got
>
>
> HTTP ERROR: 404
>
>NOT_FOUND
>
> RequestURI=/solr/admin
>
> Powered by jetty:// 
>
> Did I miss something?
>
> 2.   Since I can't get sample run, I tried to run it on tomcat
> server(5.5) directly as
>
> a.   Copy/paste apache-solr-1.4.0.war to C:\Tomcat 5.5\webapps,
>
> b.   Go to http://localhost:8080/apache-solr-1.4.0/
>
> The error message is" HTTP Status 500 - Severe errors in solr
> configuration.."
>
> 3.   How to configure it on tomcat server?
>
> Your help is appreciated very much as always,
>
> Jill
>
>
>
>
>
>


Re: Index Splitter

2009-11-24 Thread Koji Sekiguchi

Giovanni Fernandez-Kincade wrote:

Hi,
I've heard about a tool that can be used to split Lucene indexes, for cases 
where you want to break up a large index into shards. Do you know where I can 
find it? Any observations/recommendations about its use?

This seems promising but I'm not sure if there is anything more mature out 
there:
http://blog.foofactory.fi/2008/01/regenerating-equally-sized-shards-from.html

Thanks,
Gio.

  

There are IndexSplitter and MultiPassIndexSplitter tools in 3.0.

https://issues.apache.org/jira/browse/LUCENE-1959

I'd written an article about them before:

http://lucene.jugem.jp/?eid=344

It is Japanese but I think you can read out how to use them from command 
lines...


Koji

--
http://www.rondhuit.com/en/



Re: how is score computed with hsin functionquery?

2009-11-24 Thread gdeconto


gdeconto wrote:
> 
> ...
> is there some way to convert the hsin value to distance?
> ...
> 

I just noticed that the solr wiki states "Values must be in Radians" and all
my test values were in degrees.

-- 
View this message in context: 
http://old.nabble.com/how-is-score-computed-with-hsin-functionquery--tp26504265p26505091.html
Sent from the Solr - User mailing list archive at Nabble.com.



Index Splitter

2009-11-24 Thread Giovanni Fernandez-Kincade
Hi,
I've heard about a tool that can be used to split Lucene indexes, for cases 
where you want to break up a large index into shards. Do you know where I can 
find it? Any observations/recommendations about its use?

This seems promising but I'm not sure if there is anything more mature out 
there:
http://blog.foofactory.fi/2008/01/regenerating-equally-sized-shards-from.html

Thanks,
Gio.


Deduplication in 1.4

2009-11-24 Thread KaktuChakarabati

Hey,
I've been trying to find some documentation on using this feature in 1.4 but
Wiki page is alittle sparse..
In specific, here's what i'm trying to do:

I have a field, say 'duplicate_group_id' that i'll populate based on some
offline documents deduplication process I have.

All I want is for solr to compute a 'duplicate_signature' field based on
this one at update time, so that when i search for documents later, all
documents with same original 'duplicate_group_id' value will be rolled up
(e.g i'll just get the first one that came back  according to relevancy).

I enabled the deduplication processor and put it into updater, but i'm not
seeing any difference in returned results (i.e results with same
duplicate_id are returned separately..)

is there anything i need to supply in query-time for this to take effect?
what should be the behaviour? is there any working example of this?

Anything will be helpful..

Thanks,
Chak
-- 
View this message in context: 
http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html
Sent from the Solr - User mailing list archive at Nabble.com.



how is score computed with hsin functionquery?

2009-11-24 Thread gdeconto

I was looking at functionqueries, and noticed that:

1. if I use the sum functionquery, the score in the results is the sum of
the values I want to sum (all well and good and expected):

http://127.0.0.1:8080/solr/select?q=(*:*)^0%20%20_val_:"sum(1,2,3,4,5)"&fl=score,Latitude,Longitude&sort=score%20asc

2. if I use the hsin functionquery (i.e.
hsin(45.67890,-123.456789,Latitude,Longitude,10)"http://127.0.0.1:8080/solr/select?q=(*:*)^0%20%20_val_:"hsin(45.67890,-123.456789,Latitude,Longitude,10)"&fl=score,Latitude,Longitude&sort=score%20asc

assuming this is not a quirk in 1.5, is there some way to convert the hsin
value to distance?

thx
-- 
View this message in context: 
http://old.nabble.com/how-is-score-computed-with-hsin-functionquery--tp26504265p26504265.html
Sent from the Solr - User mailing list archive at Nabble.com.



why is XMLWriter declared as final?

2009-11-24 Thread Matt Mitchell
Is there any reason the XMLWriter is declared as final? I'd like to extend
it for a special case but can't. The other writers (ruby, php, json) are not
final.

Thanks,
Matt


Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-24 Thread Lance Norskog
If you are using multicore, you have to run Luke on a particular core:

http://machine:port/solr/core/admin/luke

And, admin itself:

http://machine:port/solr/core/admin

On Tue, Nov 24, 2009 at 10:18 AM, javaxmlsoapdev  wrote:
>
> Following is luke response.  is empty. can someone
> assist to find out why file content isn't being index?
>
>  
>  
>  
>  0
>  0
>  
>  
>  0
>  0
>  0
>  1259085661332
>  false
>  true
>  false
>   name="directory">org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index
>  2009-11-24T18:01:01Z
>  
>  
>  
>  
>  Indexed
>  Tokenized
>  Stored
>  Multivalued
>  TermVector Stored
>  Store Offset With TermVector
>  Store Position With TermVector
>  Omit Norms
>  Lazy
>  Binary
>  Compressed
>  Sort Missing First
>  Sort Missing Last
>  
>  Document Frequency (df) is not updated when a document is
> marked for deletion. df values include deleted documents.
>  
>  
>
> javaxmlsoapdev wrote:
>>
>> I was able to configure /docs index separately from my db data index.
>>
>> still I am seeing same behavior where it only puts .docName & its size in
>> the "content" field (I have renamed field to "content" in this new schema)
>>
>> below are the only two fields I have in schema.xml
>> > required="true" />
>> > multiValued="true"/>
>>
>> Following is updated code from test case
>>
>> File fileToIndex = new File("file.txt");
>>
>> ContentStreamUpdateRequest up = new
>> ContentStreamUpdateRequest("/update/extract");
>> up.addFile(fileToIndex);
>> up.setParam("literal.key", "8978");
>> up.setParam("literal.docName", "doc123.txt");
>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>> NamedList list = server.request(up);
>> assertNotNull("Couldn't upload .txt",list);
>>
>> QueryResponse rsp = server.query( new SolrQuery( "*:*") );
>> assertEquals( 1, rsp.getResults().getNumFound() );
>> System.out.println(rsp.getResults().get(0).getFieldValue("content"));
>>
>> Also from solr admin UI when I search for "doc123.txt" then only it
>> returns me following response. not sure why its not indexing file's
>> content into "content" attribute.
>> - 
>> - 
>> - 
>>   702
>>   text/plain
>>   doc123.txt
>>   
>>   
>>   8978
>>   
>>   
>>
>> Any idea?
>>
>> Thanks,
>>
>>
>> javaxmlsoapdev wrote:
>>>
>>> http://machinename:port/solr/admin/luke gives me 404 error so seems like
>>> its not able to find luke.
>>>
>>> I am reusing schema, which is used for indexing other entity from
>>> database, which has no relevance to documents. that was my next question
>>> that what do I put in, in a schema if my documents don't need any column
>>> mappings or anything. plus I want to keep file documents index separately
>>> from database entity index. what's the best way to do this? If I don't
>>> have any db columns etc to map and file documents index should leave
>>> separate from db entity index, what's the best way to achieve this.
>>>
>>> thanks,
>>>
>>>
>>>
>>> Grant Ingersoll-6 wrote:


 On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:

>
> *:* returns me 1 count but when I search for specific word (which was
> part of
> .txt file I indexed before) it doesn't return me anything. I don't have
> luke
> setup on my end.

 http://localhost:8983/solr/admin/luke should give yo some info.


> let me see if I can set that up quickly but otherwise do
> you see anything I am missing in solrconfig mapping or something?

 What's your schema look like and how are you querying?

> which maps
> document "content" to wrong attribute?
>
> thanks,
>
> Grant Ingersoll-6 wrote:
>>
>>
>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>
>>>
>>> Following code is from my test case where it tries to index a file
>>> (of
>>> type
>>> .txt)
>>> ContentStreamUpdateRequest up = new
>>> ContentStreamUpdateRequest("/update/extract");
>>> up.addFile(fileToIndex);
>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>> up.setParam("ext.literal.docName", "doc123.txt");
>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>>> server.request(up);
>>>
>>> test case doesn't give me any error and "I think" its indexing the
>>> file?
>>> but
>>> when I search for a text (which was part of the .txt file) search
>>> doesn't
>>> return me anything.
>>
>> What do your logs show?  Else, what does Luke show or doing a *:*
>> query
>> (assuming this is the only file you added)?
>>
>> Also, I don't think you need ext.literal anymore, just literal.
>>
>>>
>>> Following is the config from solrconfig.xml where I have mapped
>>> content
>>> to
>>> "description" field(default search field) in the schema.
>>>
>>> >> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>   
>>>     des

Re: Creating Facets

2009-11-24 Thread Lance Norskog
There is nothing special to configure. All facet processing happens
during processing the query.

On Tue, Nov 24, 2009 at 9:56 AM, Tommy Molto  wrote:
> People,
>
> I look in the solr wiki and only found about the use of the fecets, not how
> to configure it in the schema or solrconfig. Any tip how to do it?
>
> Att,
>



-- 
Lance Norskog
goks...@gmail.com


configure solr

2009-11-24 Thread Jill Han
Hi,

I just downloaded solr -1.4.0 to my computer, C:\apache-solr-1.4.0.

1.I followed the instruction to run the sample, java -jar
start.jar at C:\apache-solr-1.4.0\example

And then go to http://localhost:8983/solr/admin, however, I got 


HTTP ERROR: 404

NOT_FOUND

RequestURI=/solr/admin

Powered by jetty://  

Did I miss something?

2.   Since I can't get sample run, I tried to run it on tomcat
server(5.5) directly as

a.   Copy/paste apache-solr-1.4.0.war to C:\Tomcat 5.5\webapps,

b.   Go to http://localhost:8080/apache-solr-1.4.0/

The error message is" HTTP Status 500 - Severe errors in solr
configuration.."

3.   How to configure it on tomcat server? 

Your help is appreciated very much as always,

Jill

 

 



Re: Migrating to Solr

2009-11-24 Thread Lance Norskog
Collections in FAST do not exist in Solr. A FAST collection can be
implemented in Solr using facets or shards. The collection abstraction
in FAST is actually more shard-like in semantics: it is a separate
top-level set of content. This has strong ramifications for relevance:
if collections have the same relevance "statistical footprint", they
can go in the same shard. If they have different relevance
characteristics they should go in different shards.

Example: if book collections and movie title collections share one
shard, relevance calculations are completely bogus. They should go
into 2 separate shards and with different search tuning.

I did one conversion during Solr 1.2. I would up mass-editing all of
the XML data files into Solr'd XML input format. I cannot recommend
this technique.

(Note: since FAST charges by query, doing a deep walk and uploading to
Solr was not financially feasible.)

In general, expect to do a test conversion and then redesign your
schema and search strategies for your "real" conversion. Solr has a
lot of subtleties.

On Tue, Nov 24, 2009 at 8:11 AM, Tommy Molto  wrote:
> This is really a great source of migration. I guess i will have good
> questions after trying. But what i know that will be a little harder will be
> the use of collections (facets in Solr) and hierarquical navigators.
>
> On Tue, Nov 24, 2009 at 1:05 PM, Shashi Kant  wrote:
>
>> Here is a link that might be helpful:
>>
>> http://sesat.no/moving-from-fast-to-solr-review.html
>>
>> The site is choc-a-bloc with great information on their migration
>> experience.
>>
>>
>> On Tue, Nov 24, 2009 at 8:55 AM, Tommy Molto  wrote:
>>
>> > Hi,
>> >
>> > I'm new at Solr and i need to make a "test pilot" of a migration from
>> Fast
>> > ESP to Apache Solr, anyone had this experience before?
>> >
>> >
>> > Att,
>> >
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Multi-Term Synonyms

2009-11-24 Thread brad anderson
Thanks for the help. Can't believe I missed that part in the wiki.

2009/11/24 Tom Hill 

> Hi Brad,
>
>
> I suspect that this section from the wiki for SynonymFilterFactory might be
> relevant:
>
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>
> *"Keep in mind that while the SynonymFilter will happily work with synonyms
> containing multiple words (ie: "**sea biscuit, sea biscit, seabiscuit**")
> The recommended approach for dealing with synonyms like this, is to expand
> the synonym when indexing. This is because there are two potential issues
> that can arrise at query time:*
>
>   1.
>
>   *The Lucene QueryParser tokenizes on white space before giving any text
>   to the Analyzer, so if a person searches for the words **sea biscit** the
>   analyzer will be given the words "sea" and "biscit" seperately, and will
> not
>   know that they match a synonym."*
>
>   ...
>
> Tom
>
> On Tue, Nov 24, 2009 at 10:47 AM, brad anderson  >wrote:
>
> > Hi Folks,
> >
> > I was trying to get multi term synonyms to work. I'm experiencing some
> > strange behavior and would like some feedback.
> >
> > In the synonyms file I have the line:
> >
> > thomas, boll holly, thomas a, john q => tom
> >
> > And I have a document with the text field as;
> >
> > tom
> >
> > However, when I do a search on boll holly, it does not return the
> document
> > with tom. The same thing happens if I do a query on john q. But if I do a
> > query on thomas, it gives me the document. Also, if I quote "boll holly"
> or
> > "john q" it gives back the document.
> >
> > When I look at the analyzer page on the solr admin page, it is
> transforming
> > "boll holly" to "tom" when it isn't quoted. Why is it that it is not
> > returning the document? Is there some configuration I can make so it does
> > return the document if I do an unquoted search on "boll holly"?
> >
> > My synonym filter is defined as follows, and is only defined on the query
> > side:
> >
> >  > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >
> >
> > I've also tried changing the synonym file to be
> >
> > tom, thomas, boll holly, thomas a, john q
> >
> > This produces the same results.
> >
> > Thanks,
> > Brad
> >
>


[SolrResourceLoader] Unable to load cached class-name

2009-11-24 Thread Stuart Grimshaw
Bit of a long error message, so I won't post it all in the subject :-)

I'm trying to create a log4j solr appender to help us track down log
entries from across our jboss cluster, I might be able to make use of
the faceted search to identify errors that occur more often and things
like that.

Anyway, on to my problem, you can see the source on github
http://github.com/Stubbs/solrIndexAppender

I've deployed the contents of dist/ into JBoss's lib directory for the
server I'm running and I've also copied the contents of lib/ into
there as well. I've also copied the solrj libs into there too, but I
get the following error:

[SolrResourceLoader] Unable to load cached class-name :
org.apache.solr.search.FastLRUCache for shortname :
solr.FastLRUCachejava.lang.ClassNotFoundException:
org.apache.solr.search.FastLRUCache

I've seen posts that suggest this is because of usuing 1.3 libs, but
the only 1.3 libs I have are in my maven repo and are not deployed.

-S

Follow me on Twitter: http://twitter.com/stubbs
Blog: http://stubblog.wordpress.com
My art: http://stuartgrimshaw.imagekind.com
Stock Images: http://en.fotolia.com/partner/16775


Re: Boost document base on field length

2009-11-24 Thread Lance Norskog
The Lucene norms, if set, are 1/number of terms in the field.

I cannot find a function that makes norms available. Yo gurus- is this
impossible, a bad idea, or just an oversight?

On Tue, Nov 24, 2009 at 6:06 AM, Tomasz Kępski  wrote:
> Hi,
>
>> I think i'm reading he question differently then Grant -- his suggestion
>> applies when you are searching in the description field, and don't want
>> documents with shorter descriptions to score higher when the same terms
>> match the same number of times (the default behavior of lengthNorm)
>
>> my udnerstanding is that you want documents that don't have a description
>> to score lower then documents that do -- and you might be querying against
>> completely differnet fields (description might not even be indexed)
>>
>> in that case there is no easy way to to achieve this with just the
>> description field ... the easy thing to do is to index a boolean
>> "has_description" field and then incorporate that into your query (or as the
>> input to a function query)
>
> You get my point Hoss. In my case long description = good value. And your
> intuition is amazing ;-) I do have a field which is not used in search at
> all (image url) but docs with image have for me greater value than without
> it.
>
> I would add two fields then (boolean for photo and int for description
> length) fill them up during indexation and would play with them during the
> search.
>
> Thanks,
> Tom
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: PatternTokenizer question

2009-11-24 Thread j philoon

I think the answer to my question is contained in the wiki when discussing
the SynonymFilter, "The Lucene QueryParser tokenizes on white space before
giving any text to the Analyzer".  This would indeed explain what I am
getting.  Next question - can I avoid that behavior?


j philoon wrote:
> 
> I have defined a comma-delimited pattern tokenizer as follows:
>  positionIncrementGap="100">
>   
> 
> 
>   
> 
> 
> 
> 
> This appears to work fine when adding documents, since if I add a field
> commafld as "word1,WORD2,word 3" I see terms in the index as expected:
> "word1", "word2", and "word 3".
> 
> When I query, I am expecting that the same tokenization would take place,
> so a query that has 'commafld:(word 3)' would match term "word 3". 
> However, I find I have to submit the query as 'commafld:("word 3")'.  That
> is, it seems as if whitespace tokenization is taking place, not the
> comma-delimited tokenization.
> 
> Am I misunderstanding what should be happening or making some basic
> mistake?  Thanks. 
> 

-- 
View this message in context: 
http://old.nabble.com/PatternTokenizer-question-tp26497675p26503324.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multi-Term Synonyms

2009-11-24 Thread Tom Hill
Hi Brad,


I suspect that this section from the wiki for SynonymFilterFactory might be
relevant:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

*"Keep in mind that while the SynonymFilter will happily work with synonyms
containing multiple words (ie: "**sea biscuit, sea biscit, seabiscuit**")
The recommended approach for dealing with synonyms like this, is to expand
the synonym when indexing. This is because there are two potential issues
that can arrise at query time:*

   1.

   *The Lucene QueryParser tokenizes on white space before giving any text
   to the Analyzer, so if a person searches for the words **sea biscit** the
   analyzer will be given the words "sea" and "biscit" seperately, and will not
   know that they match a synonym."*

   ...

Tom

On Tue, Nov 24, 2009 at 10:47 AM, brad anderson wrote:

> Hi Folks,
>
> I was trying to get multi term synonyms to work. I'm experiencing some
> strange behavior and would like some feedback.
>
> In the synonyms file I have the line:
>
> thomas, boll holly, thomas a, john q => tom
>
> And I have a document with the text field as;
>
> tom
>
> However, when I do a search on boll holly, it does not return the document
> with tom. The same thing happens if I do a query on john q. But if I do a
> query on thomas, it gives me the document. Also, if I quote "boll holly" or
> "john q" it gives back the document.
>
> When I look at the analyzer page on the solr admin page, it is transforming
> "boll holly" to "tom" when it isn't quoted. Why is it that it is not
> returning the document? Is there some configuration I can make so it does
> return the document if I do an unquoted search on "boll holly"?
>
> My synonym filter is defined as follows, and is only defined on the query
> side:
>
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>
>
> I've also tried changing the synonym file to be
>
> tom, thomas, boll holly, thomas a, john q
>
> This produces the same results.
>
> Thanks,
> Brad
>


Re: Normalizing multiple Chars with MappingCharFilter possible?

2009-11-24 Thread Andreas Kahl


Am 24.11.09 12:30, schrieb Koji Sekiguchi:
> Andreas Kahl wrote:
>> Hello everyone,
>>
>> is it possible to normalize Strings like '`e' (2 chars) => 'e' (in
>> contrast to 'é' (1 char) => 'e') with
>> org.apache.lucene.analysis.MappingCharFilter?
>>
>> I am asking this because I am considering to index some multilingual
>> and multi-alphabetic data with Solr which uses such Strings as a
>> substitution for 'real' Unicode characters.
>> Thanks for your advice.
>> Andreas
>>
>>
>>   
> Yes. It should work.
> MappingCharFilter supports:
>
> * char-to-char
> * string-to-char
> * char-to-string
> * string-to-string
>
> without misalignment of original offsets (i.e. highlighter works
> correctly with MappingCharFilters).
>
> Koji
>
Thanks Koji. That was all I needed to know.

Andreas



signature.asc
Description: OpenPGP digital signature


Multi-Term Synonyms

2009-11-24 Thread brad anderson
Hi Folks,

I was trying to get multi term synonyms to work. I'm experiencing some
strange behavior and would like some feedback.

In the synonyms file I have the line:

 thomas, boll holly, thomas a, john q => tom

And I have a document with the text field as;

 tom

However, when I do a search on boll holly, it does not return the document
with tom. The same thing happens if I do a query on john q. But if I do a
query on thomas, it gives me the document. Also, if I quote "boll holly" or
"john q" it gives back the document.

When I look at the analyzer page on the solr admin page, it is transforming
"boll holly" to "tom" when it isn't quoted. Why is it that it is not
returning the document? Is there some configuration I can make so it does
return the document if I do an unquoted search on "boll holly"?

My synonym filter is defined as follows, and is only defined on the query
side:




I've also tried changing the synonym file to be

tom, thomas, boll holly, thomas a, john q

This produces the same results.

Thanks,
Brad


Re: Implementing phrase autopop up

2009-11-24 Thread darniz

Thanks for your input
You made a valid point, if we are using field type as text to get
autocomplete it wont work because it goes through tokenizer.
Hence looks like for my use case i need to have a field which uses ngram and
copy. Here is what i did

i created a filed as same as the lucid blog says.



with the following field configurtion


−





−






Now when i query i get the correct phrases for example if search for 
autocomp:"how to" i get all the correct phrases like

How to find a car
How to find a mechanic 
How to choose the right insurance company

etc... which is good.

Now I have two question.
1) Is it necessary to give the query in quote. My gut feeling is yes, since 
if you dont give quote i get phrases beginning with How followed by some
other words like How can etc...

2)if i search for word for example choose, it gives me nothing
I was expecting to see a result considering there is a word "choose" in the
phrase 
How to choose the right insurance company

i might look more at documentation but do you have anything to advice.

darniz









Shalin Shekhar Mangar wrote:
> 
> On Tue, Nov 24, 2009 at 10:12 AM, darniz  wrote:
> 
>>
>> hello all
>> Let me first explain the task i am trying to do.
>> i have article with title for example
>> 
>> >Car Insurance for Teenage Drivers
>> 
>> −
>> 
>> A Total Loss? 
>> 
>> If a user begins to type car insu i want the autopop to show up with the
>> entire phrase.
>> There are two ways to implement this.
>> First is to use the termcomponent and the other is to use a field with
>> field
>> type which uses solr.EdgeNGramFilterFactor filter.
>>
>> I started with using with Term component and i declared a term request
>> handler and gave the following query
>>
>> http://localhost:8080/solr/terms?terms.fl=title&terms.prefix=car
>> The issue is that its not giving the entire pharse, it gives me back
>> results
>> like car, caravan, carbon. Now  i know using terms.prefix will only give
>> me
>> results where the sentence start with car. On top of this i also want if
>> there is word like car somewhere in between the title that should also
>> show
>> up in autopop very much similar like google where a word is not
>> necessarily
>> start at the beginning but it could be present anywhere in the middle of
>> the
>> title.
>> The question is does TermComponent is a good candidate or  using a custom
>> field lets the name is autoPopupText with field type configured with all
>> filter and EdgeNGramFilterFactor defined and copying the title to the
>> autoPopupText field and using it to power autopopup.
>>
>> The other thing is that using  EdgeNGramFilterFactor is more from index
>> point of view when you index document you need to know which fields you
>> want
>> to copy to autoPopupText field where as using Term component is more like
>> you can define at query time what fields you want to use to fetch
>> autocomplete from.
>>
>> Any idea whats the best and why the Term component is not giving me an
>> entire phrase which i mentioned earlier.
>> FYI
>> my title field is of type text.
>>
> 
> 
> You are using a tokenized field type with TermsComponent therefore each
> word
> in your phrase gets indexed as a separate token. You should use a
> non-tokenized type (such as a string type) with TermsComponent. However,
> this will only let you search by prefix and not by words in between the
> phrase.
> 
> Your best bet here would be to use EdgeNGramFilterFactory. If your index
> is
> very large, you can consider doing a prefix search on shingles too.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Implementing-phrase-autopop-up-tp26490419p26499912.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-24 Thread javaxmlsoapdev

Following is luke response.  is empty. can someone
assist to find out why file content isn't being index?

   
 
 
  0 
  0 
  
 
  0 
  0 
  0 
  1259085661332 
  false 
  true 
  false 
  org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index
 
  2009-11-24T18:01:01Z 
  
   
 
 
  Indexed 
  Tokenized 
  Stored 
  Multivalued 
  TermVector Stored 
  Store Offset With TermVector 
  Store Position With TermVector 
  Omit Norms 
  Lazy 
  Binary 
  Compressed 
  Sort Missing First 
  Sort Missing Last 
  
  Document Frequency (df) is not updated when a document is
marked for deletion. df values include deleted documents. 
  
  

javaxmlsoapdev wrote:
> 
> I was able to configure /docs index separately from my db data index.
> 
> still I am seeing same behavior where it only puts .docName & its size in
> the "content" field (I have renamed field to "content" in this new schema)
> 
> below are the only two fields I have in schema.xml
>  required="true" /> 
>  multiValued="true"/>   
> 
> Following is updated code from test case
> 
> File fileToIndex = new File("file.txt");
> 
> ContentStreamUpdateRequest up = new
> ContentStreamUpdateRequest("/update/extract");
> up.addFile(fileToIndex);
> up.setParam("literal.key", "8978");
> up.setParam("literal.docName", "doc123.txt");
> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
> NamedList list = server.request(up);
> assertNotNull("Couldn't upload .txt",list);
>   
> QueryResponse rsp = server.query( new SolrQuery( "*:*") );
> assertEquals( 1, rsp.getResults().getNumFound() );
> System.out.println(rsp.getResults().get(0).getFieldValue("content"));
> 
> Also from solr admin UI when I search for "doc123.txt" then only it
> returns me following response. not sure why its not indexing file's
> content into "content" attribute.
> - 
> - 
> - 
>   702 
>   text/plain 
>   doc123.txt 
>
>   
>   8978 
>   
>   
> 
> Any idea?
> 
> Thanks,
> 
> 
> javaxmlsoapdev wrote:
>> 
>> http://machinename:port/solr/admin/luke gives me 404 error so seems like
>> its not able to find luke.
>> 
>> I am reusing schema, which is used for indexing other entity from
>> database, which has no relevance to documents. that was my next question
>> that what do I put in, in a schema if my documents don't need any column
>> mappings or anything. plus I want to keep file documents index separately
>> from database entity index. what's the best way to do this? If I don't
>> have any db columns etc to map and file documents index should leave
>> separate from db entity index, what's the best way to achieve this.
>> 
>> thanks,
>> 
>> 
>> 
>> Grant Ingersoll-6 wrote:
>>> 
>>> 
>>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>>> 
 
 *:* returns me 1 count but when I search for specific word (which was
 part of
 .txt file I indexed before) it doesn't return me anything. I don't have
 luke
 setup on my end.
>>> 
>>> http://localhost:8983/solr/admin/luke should give yo some info.
>>> 
>>> 
 let me see if I can set that up quickly but otherwise do
 you see anything I am missing in solrconfig mapping or something?
>>> 
>>> What's your schema look like and how are you querying?
>>> 
 which maps
 document "content" to wrong attribute?
 
 thanks,
 
 Grant Ingersoll-6 wrote:
> 
> 
> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
> 
>> 
>> Following code is from my test case where it tries to index a file
>> (of
>> type
>> .txt)
>> ContentStreamUpdateRequest up = new
>> ContentStreamUpdateRequest("/update/extract");
>> up.addFile(fileToIndex);
>> up.setParam("literal.key", "8978"); //key is the uniqueId
>> up.setParam("ext.literal.docName", "doc123.txt");
>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);   
>> server.request(up);  
>> 
>> test case doesn't give me any error and "I think" its indexing the
>> file?
>> but
>> when I search for a text (which was part of the .txt file) search
>> doesn't
>> return me anything.
> 
> What do your logs show?  Else, what does Luke show or doing a *:*
> query
> (assuming this is the only file you added)?
> 
> Also, I don't think you need ext.literal anymore, just literal.
> 
>> 
>> Following is the config from solrconfig.xml where I have mapped
>> content
>> to
>> "description" field(default search field) in the schema.
>> 
>> > class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>   
>> description
>> description
>>   
>> 
>> 
>> Clearly it seems I am missing something. Any idea?
> 
> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>

Creating Facets

2009-11-24 Thread Tommy Molto
People,

I look in the solr wiki and only found about the use of the fecets, not how
to configure it in the schema or solrconfig. Any tip how to do it?

Att,


Re: initiate reindexing in solr for field type changes

2009-11-24 Thread darniz

thanks
darniz


Shalin Shekhar Mangar wrote:
> 
> On Thu, Nov 19, 2009 at 4:50 AM, darniz  wrote:
> 
>>
>> Thanks
>> Could you elaborate what is compatible schema change.
>> Do you mean schema change which deals only with query time.
>>
>>
> A compatible schema change would be addition of new fields. Removal of
> fields may also be called compatible as long as your application does not
> try to index or query them.
> 
> Modifying the field type of an existing field or adding/removing/modifying
> tokenizers or filters on a field type is usually an incompatible change
> and
> needs re-indexing of affected documents.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/initiate-reindexing-in-solr-for-field-type-changes-tp26397067p26499804.html
Sent from the Solr - User mailing list archive at Nabble.com.



SolrPlugin Guidance

2009-11-24 Thread Vauthrin, Laurent
Hello,

 

Our team is trying to make a Solr plugin that needs to parse/decompose a
given query into potentially multiple queries.  The idea is that we're
trying to abstract a complex schema (with different document types) from
the users so that their queries can be simpler.

 

So basically, we're trying to do the following:

 

1.   Decompose query A into query B and query C

2.   Send query B to all shards and plug query B's results into
query C

3.   Send Query C to all shards and pass the results back to the
client

 

I started trying to implement this by subclassing the SearchHandler but
realized that I would not have access to HttpCommComponent.  Then I
tried to replicate the SearchHandler class but realized that I might not
have access to fields I would need in ShardResponse.  So I figured I
should step back and get advice from the mailing list now J.  What is
the best plugin point for decomposing a query into multiple queries so
that all resultant queries can be sent to each shard?

 

Thanks,
Laurent Vauthrin



Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-24 Thread javaxmlsoapdev

I was able to configure /docs index separately from my db data index. 

still I am seeing same behavior where it only puts .docName & its size in
the "content" field (I have renamed field to "content" in this new schema) 

below are the only two fields I have in schema.xml 
 
 

Following is updated code from test case 

File fileToIndex = new File("file.txt"); 

ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract"); 
up.addFile(fileToIndex); 
up.setParam("literal.key", "8978"); 
up.setParam("literal.docName", "doc123.txt"); 
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); 
NamedList list = server.request(up); 
assertNotNull("Couldn't upload .txt",list); 

QueryResponse rsp = server.query( new SolrQuery( "*:*") ); 
assertEquals( 1, rsp.getResults().getNumFound() ); 
System.out.println(rsp.getResults().get(0).getFieldValue("content")); 

Also from solr admin UI when I search for "doc123.txt" then only it returns
me following response. not sure why its not indexing file's content into
"content" attribute. 
  
  
  
  702 
  text/plain 
  doc123.txt 
   
   
  8978 
   
   

Any idea? 

Thanks, 

-- 
View this message in context: 
http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26498946.html
Sent from the Solr - User mailing list archive at Nabble.com.



Trouble Configuring WordDelimiterFilterFactory

2009-11-24 Thread Rahul R
Hello,
In our application we have a catch-all field (the 'text' field) which is
cofigured as the default search field. Now this field will have a
combination of numbers, alphabets, special characters etc. I have a
requirement wherein the WordDelimiterFilterFactory does not work on numbers,
especially those with decimal points. Accuracy of results with relevance to
numerical data is quite important, So if the text field of a document has
data like "Bridge-Diode 3.55 Volts", I want to make sure that a search for
"355" or "35.5" does not retrieve this document. So I found the following
setting for the WordDelimiterFilterFactory to work for me (for most parts):


I am using the same setting for both index and query.

Now the only problem is, if I have data like ".355". With the above setting,
the analysis jsp shows me that WordDelimiterFilterFactory is creating term
texts as both ".355' and "355". So a search for ".355" retrieves documents
containing both ".355" and "355". A search for "355" also has the same
effect. I noticed that when the entry for the WordDelimiterFilterFactory was
completely removed (both index and query), then the above problem was
resolved. But this seems too harsh a measure.

Is there a way by which I can prevent the WordDelimiterFilterFactory from
totally acting on numerical data ?

Regards
Rahul


Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-24 Thread javaxmlsoapdev

I was able to configure /docs index separately from my db data index.

still I am seeing same behavior where it only puts .docName & its size in
the "content" field (I have renamed field to "content" in this new schema)

below are the only two fields I have in schema.xml
 
 

Following is updated code from test case

File fileToIndex = new File("file.txt");

ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(fileToIndex);
up.setParam("literal.key", "8978");
up.setParam("literal.docName", "doc123.txt");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList list = server.request(up);
assertNotNull("Couldn't upload .txt",list);

QueryResponse rsp = server.query( new SolrQuery( "*:*") );
assertEquals( 1, rsp.getResults().getNumFound() );
System.out.println(rsp.getResults().get(0).getFieldValue("content"));

Also from solr admin UI when I search for "doc123.txt" then only it returns
me following response. not sure why its not indexing file's content into
"content" attribute.
- 
- 
- 
  702 
  text/plain 
  doc123.txt 
   
  
  8978 
  
  

Any idea?

Thanks,


javaxmlsoapdev wrote:
> 
> http://machinename:port/solr/admin/luke gives me 404 error so seems like
> its not able to find luke.
> 
> I am reusing schema, which is used for indexing other entity from
> database, which has no relevance to documents. that was my next question
> that what do I put in, in a schema if my documents don't need any column
> mappings or anything. plus I want to keep file documents index separately
> from database entity index. what's the best way to do this? If I don't
> have any db columns etc to map and file documents index should leave
> separate from db entity index, what's the best way to achieve this.
> 
> thanks,
> 
> 
> 
> Grant Ingersoll-6 wrote:
>> 
>> 
>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>> 
>>> 
>>> *:* returns me 1 count but when I search for specific word (which was
>>> part of
>>> .txt file I indexed before) it doesn't return me anything. I don't have
>>> luke
>>> setup on my end.
>> 
>> http://localhost:8983/solr/admin/luke should give yo some info.
>> 
>> 
>>> let me see if I can set that up quickly but otherwise do
>>> you see anything I am missing in solrconfig mapping or something?
>> 
>> What's your schema look like and how are you querying?
>> 
>>> which maps
>>> document "content" to wrong attribute?
>>> 
>>> thanks,
>>> 
>>> Grant Ingersoll-6 wrote:
 
 
 On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
 
> 
> Following code is from my test case where it tries to index a file (of
> type
> .txt)
> ContentStreamUpdateRequest up = new
> ContentStreamUpdateRequest("/update/extract");
> up.addFile(fileToIndex);
> up.setParam("literal.key", "8978"); //key is the uniqueId
> up.setParam("ext.literal.docName", "doc123.txt");
> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
> server.request(up);   
> 
> test case doesn't give me any error and "I think" its indexing the
> file?
> but
> when I search for a text (which was part of the .txt file) search
> doesn't
> return me anything.
 
 What do your logs show?  Else, what does Luke show or doing a *:* query
 (assuming this is the only file you added)?
 
 Also, I don't think you need ext.literal anymore, just literal.
 
> 
> Following is the config from solrconfig.xml where I have mapped
> content
> to
> "description" field(default search field) in the schema.
> 
>  class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>   
> description
> description
>   
> 
> 
> Clearly it seems I am missing something. Any idea?
 
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using
 Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 
>>> 
>>> -- 
>>> View this message in context:
>>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26498552.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Migrating to Solr

2009-11-24 Thread Tommy Molto
This is really a great source of migration. I guess i will have good
questions after trying. But what i know that will be a little harder will be
the use of collections (facets in Solr) and hierarquical navigators.

On Tue, Nov 24, 2009 at 1:05 PM, Shashi Kant  wrote:

> Here is a link that might be helpful:
>
> http://sesat.no/moving-from-fast-to-solr-review.html
>
> The site is choc-a-bloc with great information on their migration
> experience.
>
>
> On Tue, Nov 24, 2009 at 8:55 AM, Tommy Molto  wrote:
>
> > Hi,
> >
> > I'm new at Solr and i need to make a "test pilot" of a migration from
> Fast
> > ESP to Apache Solr, anyone had this experience before?
> >
> >
> > Att,
> >
>


Re: access denied to solr home lib dir

2009-11-24 Thread Charles Moad
 Thank you all for the insight into this problem.  I was 100%
positive that selinux and file permissions were not the problems.
Turns out that tomcat 6 on ubuntu comes with a tomcat security manager
enabled by default.  I had no desire to figure out how this works
since this is for local testing.  I did find you could simply set
"TOMCAT6_SECURITY=no" in "/etc/default/tomcat6".  Restarting tomcat
after that fixed my problems.

Thanks again,
 Charlie

On Mon, Nov 23, 2009 at 4:38 PM, Chris Hostetter
 wrote:
>
> : Check.  I even verified that the tomcat user could create the
> : directory (i.e. "sudo -u tomcat6 mkdir /opt/solr/steve/lib").  Still
> : solr complains.
>
> Note that you have an AccessControlException, not a simple
> FileNotFoundException ... the error here is coming from File.canRead (when
> Solr is asking if it has permision to read the file) but your
> ServletContainer evidently has a security policy in place that prevent's
> solr from even checking (if the security policy allowed it to check, then
> it would return true/false based on the actaul file permisions)...
>
> http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html#canRead%28%29
>
>    Tests whether the application can read the file denoted by this
>    abstract pathname.
>
>    Returns:
>        true if and only if the file specified by this abstract pathname
>        exists and can be read by the application; false otherwise
>    Throws:
>        SecurityException - If a security manager exists and its
>        SecurityManager.checkRead(java.lang.String) method denies read
>        access to the file
>
> ...note that Tomcat doesn't have any special SecurityManager settings that
> prevent this by default.  something about your tomcat deployment must be
> specifying specific Security Permision rules.
>
> : >> Caused by: java.security.AccessControlException: access denied
> : >> (java.io.FilePermission /opt/solr/steve/./lib read)
> : >>       at 
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
> : >>       at 
> java.security.AccessController.checkPermission(AccessController.java:546)
> : >>       at 
> java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
> : >>       at java.lang.SecurityManager.checkRead(SecurityManager.java:871)
> : >>       at java.io.File.canRead(File.java:689)
> : >>       at 
> org.apache.solr.core.SolrResourceLoader.replaceClassLoader(SolrResourceLoader.java:157)
> : >>       at 
> org.apache.solr.core.SolrResourceLoader.addToClassLoader(SolrResourceLoader.java:128)
> : >>       at 
> org.apache.solr.core.SolrResourceLoader.(SolrResourceLoader.java:97)
> : >>       at 
> org.apache.solr.core.SolrResourceLoader.(SolrResourceLoader.java:195)
> : >>       at org.apache.solr.core.Config.(Config.java:93)
> : >>       at 
> org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:65)
> : >>       ... 40 more
>
>
> -Hoss
>
> imamuseum.org made the following annotations
> -
> Be A Member. Be Amazed. | Make your Museum, your community and your world a 
> better place. | Join Today!
>
>
> -
>
>
>
> NOTICE:
>
> Mon Nov 23 2009 16:38:55
>
>
>
> This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged information. Any
> unauthorized review, use, disclosure or distribution is prohibited. If you are
> not the intended recipient, please contact the sender by reply email and
> destroy all copies of the original message.
> -
>
>


PatternTokenizer question

2009-11-24 Thread j philoon

I have defined a comma-delimited pattern tokenizer as follows:

  


  




This appears to work fine when adding documents, since if I add a field
commafld as "word1,WORD2,word 3" I see terms in the index as expected:
"word1", "word2", and "word 3".

When I query, I am expecting that the same tokenization would take place, so
a query that has 'commafld:(word 3)' would match term "word 3".  However, I
find I have to submit the query as 'commafld:("word 3")'.  That is, it seems
as if whitespace tokenization is taking place, not the comma-delimited
tokenization.

Am I misunderstanding what should be happening or making some basic mistake? 
Thanks. 
-- 
View this message in context: 
http://old.nabble.com/PatternTokenizer-question-tp26497675p26497675.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Get one document from each category

2009-11-24 Thread Andrey Klochkov
Hi

I think you need field collapsing, look here

http://wiki.apache.org/solr/FieldCollapsing

2009/11/24 Tomasz Kępski 

> Hi,
>
> I have the following case:
>
> In my index I do have documents categorized (category_id - int sortable
> field). I would like to get three top documents matching user query BUT each
> have to be from different category.:
>
> for example from returned set (doc_id : category id):
>
> 1:1
> 2:1
> 3:1
> 4:2
> 5:1
> 6:2
> 7:3
> 8:4
>
> I would like to get docs 1, 4 and 7.
> Is that possible without quering 3 times? Often lot of (more than my limit)
> the docs at the beginning are from the same category.
> I'm using PHP Apache Solr so I would like to avoid processing large sets of
> data in my PHP based application.
>
> Tomek
>



-- 
Andrew Klochkov
Senior Software Engineer,
Grid Dynamics


Re: Migrating to Solr

2009-11-24 Thread Shashi Kant
Here is a link that might be helpful:

http://sesat.no/moving-from-fast-to-solr-review.html

The site is choc-a-bloc with great information on their migration
experience.


On Tue, Nov 24, 2009 at 8:55 AM, Tommy Molto  wrote:

> Hi,
>
> I'm new at Solr and i need to make a "test pilot" of a migration from Fast
> ESP to Apache Solr, anyone had this experience before?
>
>
> Att,
>


Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-24 Thread javaxmlsoapdev

http://machinename:port/solr/admin/luke gives me 404 error so seems like its
not able to find luke.

I am reusing schema, which is used for indexing other entity from database,
which has no relevance to documents. that was my next question that what do
I put in, in a schema if my documents don't need any column mappings or
anything. plus I want to keep file documents index separately from database
entity index. what's the best way to do this? If I don't have any db columns
etc to map and file documents index should leave separate from db entity
index, what's the best way to achieve this.

thanks,



Grant Ingersoll-6 wrote:
> 
> 
> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
> 
>> 
>> *:* returns me 1 count but when I search for specific word (which was
>> part of
>> .txt file I indexed before) it doesn't return me anything. I don't have
>> luke
>> setup on my end.
> 
> http://localhost:8983/solr/admin/luke should give yo some info.
> 
> 
>> let me see if I can set that up quickly but otherwise do
>> you see anything I am missing in solrconfig mapping or something?
> 
> What's your schema look like and how are you querying?
> 
>> which maps
>> document "content" to wrong attribute?
>> 
>> thanks,
>> 
>> Grant Ingersoll-6 wrote:
>>> 
>>> 
>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>> 
 
 Following code is from my test case where it tries to index a file (of
 type
 .txt)
 ContentStreamUpdateRequest up = new
 ContentStreamUpdateRequest("/update/extract");
 up.addFile(fileToIndex);
 up.setParam("literal.key", "8978"); //key is the uniqueId
 up.setParam("ext.literal.docName", "doc123.txt");
 up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); 
 server.request(up);
 
 test case doesn't give me any error and "I think" its indexing the
 file?
 but
 when I search for a text (which was part of the .txt file) search
 doesn't
 return me anything.
>>> 
>>> What do your logs show?  Else, what does Luke show or doing a *:* query
>>> (assuming this is the only file you added)?
>>> 
>>> Also, I don't think you need ext.literal anymore, just literal.
>>> 
 
 Following is the config from solrconfig.xml where I have mapped content
 to
 "description" field(default search field) in the schema.
 
 >>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
   
 description
 description
   
 
 
 Clearly it seems I am missing something. Any idea?
>>> 
>>> 
>>> 
>>> --
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>> 
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>> 
>>> 
>>> 
>> 
>> -- 
>> View this message in context:
>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26497295.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Migrating to Solr

2009-11-24 Thread Grant Ingersoll
I've done been involved with a fair share of these migrations now, what are you 
looking for?

On Nov 24, 2009, at 8:55 AM, Tommy Molto wrote:

> Hi,
> 
> I'm new at Solr and i need to make a "test pilot" of a migration from Fast
> ESP to Apache Solr, anyone had this experience before?
> 
> 
> Att,

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search



Get one document from each category

2009-11-24 Thread Tomasz Kępski

Hi,

I have the following case:

In my index I do have documents categorized (category_id - int sortable 
field). I would like to get three top documents matching user query BUT 
each have to be from different category.:


for example from returned set (doc_id : category id):

1:1
2:1
3:1
4:2
5:1
6:2
7:3
8:4

I would like to get docs 1, 4 and 7.
Is that possible without quering 3 times? Often lot of (more than my 
limit) the docs at the beginning are from the same category.
I'm using PHP Apache Solr so I would like to avoid processing large sets 
of data in my PHP based application.


Tomek


Re: Migrating to Solr

2009-11-24 Thread Lukáš Vlček
Hi,

I think there were some links about FAST to Solr migration published
recently. See:
http://blog.isabel-drost.de/index.php/archives/110/moving-from-fast-to-solr
However, as of writing those links are not working, not sure what happend...

Regards,
Lukas


On Tue, Nov 24, 2009 at 2:55 PM, Tommy Molto  wrote:

> Hi,
>
> I'm new at Solr and i need to make a "test pilot" of a migration from Fast
> ESP to Apache Solr, anyone had this experience before?
>
>
> Att,
>


Re: Boost document base on field length

2009-11-24 Thread Tomasz Kępski

Hi,

I think i'm reading he question differently then Grant -- his suggestion 
applies when you are searching in the description field, and don't want 
documents with shorter descriptions to score higher when the same terms 
match the same number of times (the default behavior of lengthNorm)


my udnerstanding is that you want documents that don't have a description 
to score lower then documents that do -- and you might be querying against 
completely differnet fields (description might not even be indexed)


in that case there is no easy way to to achieve this with just the 
description field ... the easy thing to do is to index a boolean 
"has_description" field and then incorporate that into your query (or as 
the input to a function query)


You get my point Hoss. In my case long description = good value. And 
your intuition is amazing ;-) I do have a field which is not used in 
search at all (image url) but docs with image have for me greater value 
than without it.


I would add two fields then (boolean for photo and int for description 
length) fill them up during indexation and would play with them during 
the search.


Thanks,
Tom



Migrating to Solr

2009-11-24 Thread Tommy Molto
Hi,

I'm new at Solr and i need to make a "test pilot" of a migration from Fast
ESP to Apache Solr, anyone had this experience before?


Att,


Re: Turning down logging for SOLR running on Weblogic

2009-11-24 Thread Mark Miller
DEO, SHANTANU S (ATTCINW) wrote:
> Hi 
>  We recently started a SOLR instance running under Weblogic and noticed
> that there are a lot of DEBUG messages being output, that we did not
> notice before when we used tomcat. 
> Where can we turn this logging level down ?
>
> Thanks
> Shantanu
> AT&T eCommerce Web Hosting - Release Management
> Office: (425)288-6081
> email: sd1...@att.com
>   

If you are using Solr 1.4 you can use different logging frameworks - by
default its java util logging - so you just use a standard java util
logging to config to set your logging levels:
http://wiki.apache.org/solr/SolrLogging

I believe it uses a default config in the JRE folder if you don't set
your own config (with a system property on starting your container) - so
perhaps that got set to debug? It normally defaults to info.

- Mark


Re: [N to M] range search out of sum of field. howto search this?

2009-11-24 Thread Julian Davchev
Hi,
You got right what I am after.
Seems I will have to find a workaround for this one.
Also I am still stuck on 1.3 so..
Thanks a lot
JD

Chris Hostetter wrote:
> : fq={!frange l=5 u=10}sum(user,num)
>
> H, One of us massivly missunderstood the original question - and i'm 
> pretty sure it's Yonik.
>
> i don't think he wants results where the user field plus the num field are 
> in the range of 5-10 ... i think he wants the list of user Ids (which are 
> numbers in his examples, but could just as easily be strings) where the 
> sum of the "num" fields in all documents that have the same value in the 
> "user" field are the same.
>
> I can't think of any easy way to do that ... it isn't the kind of thing an 
> Inverted Index is particuaraly good at.  but maybe there's soemthing in 
> the Field Collapsing patch (searching the archives/wiki will bring up 
> pointers) that can filter on stats like this?
>
> : On Mon, Nov 23, 2009 at 8:49 AM, Julian Davchev  wrote:
> : > Hi folks,
> : > I got documents like
> : > user:1   num:5
> : > user:1   num: 8
> : > user:5   num:7
> : > user:5   num:1
> : > 
> : >
> : >
> : > I'd like to get per user that maches sum of num range 5 to 10
> : > In this case it should return user 5  as 7+1=8 and is within range.
> : > User 1 will be false cause sum of num is 5+8=13 hence outside range 5 to 
> 10
>
> -Hoss
>   



Turning down logging for SOLR running on Weblogic

2009-11-24 Thread DEO, SHANTANU S (ATTCINW)
Hi 
 We recently started a SOLR instance running under Weblogic and noticed
that there are a lot of DEBUG messages being output, that we did not
notice before when we used tomcat. 
Where can we turn this logging level down ?

Thanks
Shantanu
AT&T eCommerce Web Hosting - Release Management
Office: (425)288-6081
email: sd1...@att.com


Re: help with dataimport delta query

2009-11-24 Thread Joel Nylund
Thanks that was it, well really this part:

${dataimporter.delta.job_jobs_id}

I thought the jobs_id was part of the DIH, but I guess it was just the example, 
duh!

thanks
Joel


--- On Tue, 11/24/09, Noble Paul നോബിള്‍  नोब्ळ्  
wrote:

> From: Noble Paul നോബിള്‍  नोब्ळ् 
> Subject: Re: help with dataimport delta query
> To: solr-user@lucene.apache.org
> Date: Tuesday, November 24, 2009, 12:15 AM
> I guess the field names do not match
> in the deltaQuery you are selecting the field id
> 
> and in the deltaImportQuery you us the field as
> ${dataimporter.delta.job_jobs_id}
> I guess it should be ${dataimporter.delta.id}
> 
> On Tue, Nov 24, 2009 at 1:19 AM, Joel Nylund 
> wrote:
> > Hi, I have solr all working nicely, except im trying
> to get deltas to work
> > on my data import handler
> >
> > Here is a simplification of my data import config, I
> have a table called
> > "Book" which has categories, im doing subquries for
> the category info and
> > calling a javascript helper. This all works perfectly
> for the regular query.
> >
> > I added these lines for the delta stuff:
> >
> >        deltaImportQuery="SELECT f.id,f.title
> >                        FROM Book f
> >                      
>  f.id='${dataimporter.delta.job_jobs_id}'"
> >                deltaQuery="SELECT id FROM
> `Book` WHERE fm.inMyList=1 AND
> > lastModifiedDate >
> '${dataimporter.last_index_time}'"  >
> >
> > basically im trying to rows that lastModifiedDate is
> newer than the last
> > index (or deltaindex).
> >
> > I run:
> > http://localhost:8983/solr/dataimport?command=delta-import
> >
> > And it says in logs:
> >
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DataImporter
> > doDeltaImport
> > INFO: Starting Delta Import
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.SolrWriter
> > readIndexerProperties
> > INFO: Read dataimport.properties
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > doDelta
> > INFO: Starting delta collection.
> > Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore
> execute
> > INFO: [] webapp=/solr path=/dataimport
> params={command=delta-import}
> > status=0 QTime=0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Running ModifiedRowKey() for Entity: category
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed ModifiedRowKey for Entity: category
> rows obtained : 0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed DeletedRowKey for Entity: category
> rows obtained : 0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed parentDeltaQuery for Entity: category
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Running ModifiedRowKey() for Entity: item
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed ModifiedRowKey for Entity: item rows
> obtained : 0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed DeletedRowKey for Entity: item rows
> obtained : 0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed parentDeltaQuery for Entity: item
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > doDelta
> > INFO: Delta Import completed successfully
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > execute
> > INFO: Time taken = 0:0:0.21
> >
> > But the browser says no documents added/modified (even
> though one record in
> > db is a match)
> >
> > Is there a way to turn debugging so I can see the
> queries the DIH is sending
> > to the db?
> >
> > Any other ideas of what I could be doing wrong?
> >
> > thanks
> > Joel
> >
> >
> > 
> >     >      query="SELECT f.id, f.title
> >                FROM Book f
> >                WHERE f.inMyList=1"
> >                deltaImportQuery="SELECT
> f.id,f.title
> >                        FROM Book f
> >                      
>  f.id='${dataimporter.delta.job_jobs_id}'"
> >                deltaQuery="SELECT id FROM
> `Book` WHERE fm.inMyList=1 AND
> > lastModifiedDate >
> '${dataimporter.last_index_time}'"  >
> >
> >           
> >            />
> >                 > transformer="script:SplitAndPrettyCategory"
> query="select fc.bookId,
> > group_concat(cr.name) as categoryName,
> >                 from BookCat fc
> >                 where fc.bookId = '${item.id}'
> AND
> >                 group by fc.bookId">
> >                  column="categoryType" name="categoryType" />
> >                 
> >    
> >   
> >
> >
> >
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>


Re: initiate reindexing in solr for field type changes

2009-11-24 Thread Shalin Shekhar Mangar
On Thu, Nov 19, 2009 at 4:50 AM, darniz  wrote:

>
> Thanks
> Could you elaborate what is compatible schema change.
> Do you mean schema change which deals only with query time.
>
>
A compatible schema change would be addition of new fields. Removal of
fields may also be called compatible as long as your application does not
try to index or query them.

Modifying the field type of an existing field or adding/removing/modifying
tokenizers or filters on a field type is usually an incompatible change and
needs re-indexing of affected documents.

-- 
Regards,
Shalin Shekhar Mangar.


Re: solr+jetty logging to syslog?

2009-11-24 Thread Shalin Shekhar Mangar
On Sun, Nov 22, 2009 at 2:39 AM, Steve Conover  wrote:

> Does no one send solr logging to syslog?
>
> On Thu, Nov 19, 2009 at 5:54 PM, Steve Conover  wrote:
> > The solution involves slf4j to log4j to syslog (at least, for solr),
> > but I'm having some trouble stringing all the parts together.  If
> > anyone is doing this, would you mind posting how you use slf4j-log4j
> > jar, what your log4j.properties looks like, what your java system
> > properties settings are, and anything else you think is relevant?
> >
>

I guess you may get better help if you ask this on slf4j or the log4j
mailing lists.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Huge load and long response times during search

2009-11-24 Thread Tomasz Kępski

Hi,


: I'm using SOLR(1.4) to search among about 3,500,000 documents. After the
: server kernel was updated to 64bit system has started to suffer.

...if the *only* thing that was upgraded was switching the kernel from 
32bit to 64bit, then perhaps you are getting bit by java now using 64 bit 
pointers instead of 32 bit pointers, causing a lot more ram to be eaten up 
by the pointers?


it's not soemthing i've done a lot of testing on, but i've heared other 
people claim that it can cause some serious problems if you don't actaully 
need 64bit pointers for accessing huge heaps.


...that said, you should really double check what exactly what changed 
when your server was upgraded ... perhaps the upgrad inlcuded a new 
filesystem type, or changes to RAID settings, or even hardware changes ... 
if your problems started when an upgrade took place, then looking into 
what exactly changed during hte upgrade should be your furst step.


The kernel was the only thing which was changed. There were no hardware 
update, nobody touch the filesystem as well. So now this is a 32bit 
Debian with 64bit kernel
I have heard from our admins that the previous kernel had a grsec patch 
which regural killed java processes with signal 11.


To find out if the SOLR is a single problem or is the coegzistence of 
other services at one machine we are going to move solr to another one 
(same configuration) which is low used (small php app providing data 
from memcache filled once per hour).


Tom


Re: Normalizing multiple Chars with MappingCharFilter possible?

2009-11-24 Thread Koji Sekiguchi

Andreas Kahl wrote:

Hello everyone,

is it possible to normalize Strings like '`e' (2 chars) => 'e' (in contrast to 'é' 
(1 char) => 'e') with org.apache.lucene.analysis.MappingCharFilter?

I am asking this because I am considering to index some multilingual and multi-alphabetic data with Solr which uses such Strings as a substitution for 'real' Unicode characters. 

Thanks for your advice. 


Andreas


  

Yes. It should work.
MappingCharFilter supports:

* char-to-char
* string-to-char
* char-to-string
* string-to-string

without misalignment of original offsets (i.e. highlighter works
correctly with MappingCharFilters).

Koji

--
http://www.rondhuit.com/en/



Re: creating Lucene document from an external XML file.

2009-11-24 Thread Phanindra Reva
Hello...,
 Thank you both for patiently reading and understanding my question.
//  " you already have code that builds up files in the
"..." update
message syntax solr expects, but you want to modify those documents (wi/o
changing your existing code) .. " .. //
  yeah.. I already have the document collection. I have to
change values of some fields of all the documents before indexing.

// "  one possibility to think about is that instead of modifying the documents
before sending them to Solr, you could write an UpdateProcessor tha runs
direclty in Solr and gets access to those Documents after Solr has already
parsed that XML (or even if the documents come from someplace else, like
DIH, or a CSV file) and then make your changes. " //
   I have not decided to modify documents, instead I go for
modifying them at run time. (modifying Java object's variables that
contains information extracted from the document-file).
my question is : Is there any part of the api which take document file
path as input , returns java object and gives us a way to modify
inbetween before sending the same object for indexing (to the
IndexWriter - lucene api).
  I think.. Otis gave an answer that there is no API, instead go
for external java XML apis for the completion of  the task.
I am sorry, If my description is really making things complicated.
Thanks.


On Mon, Nov 23, 2009 at 9:36 PM, Chris Hostetter
 wrote:
>
> : If I understand you correctly, you really want to be constructing
> : SolrInputDocuments (not Lucene's Documents) and indexing those with
> : SolrJ.  I don't think there is anything in the API that can read in an
>
> I read your question differently then Otis did.  My understanding is that
> you already have code that builds up files in the "..." update
> message syntax solr expects, but you want to modify those documents (wi/o
> changing your existing code)
>
> one possibility to think about is that instead of modifying the documents
> before sending them to Solr, you could write an UpdateProcessor tha runs
> direclty in Solr and gets access to those Documents after Solr has already
> parsed that XML (or even if the documents come from someplace else, like
> DIH, or a CSV file) and then make your changes.
>
>
> If Otis and i have *both* missunderstood your question, please clarify.
>
>
>
> -Hoss
>
>


Re: error with multicore CREATE action

2009-11-24 Thread Shalin Shekhar Mangar
On Mon, Nov 23, 2009 at 11:49 PM, Chris Harris  wrote:

> Are there any use cases for CREATE where the instance directory
> *doesn't* yet exist? I ask because I've noticed that Solr will create
> an instance directory for me sometimes with the CREATE command. In
> particular, if I run something like
>
> http://solrhost/solr/admin/cores?action=CREATE&name=newcore&instanceDir=d
> :\dir_that_does_not_exist\&config=C:\dir_that_does_exist\solrconfig.xml&schema=C:\dir_that_does_exist\schema.xml
>
> then Solr will create
>
> d:\dir_that_does_not_exist
>
> and
>
> d:\dir_that_does_not_exist\data
>
>
I guess when you try to add documents and an IndexWriter is opened, the data
directory is created if it does not exist. Since it calls File#mkdirs, all
parent directories are also created. I don't think Solr creates those
directories by itself.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Tomcat vs Jetty for a solr instance?

2009-11-24 Thread Shalin Shekhar Mangar
On Tue, Nov 24, 2009 at 3:14 PM, Kevin Jackson  wrote:

> Hi,
>
> We're running a high traffic (very high peak load) site with solr 1.3
> (we can't upgrade to 1.4 just yet as we don't have capacity to remove
> one of our servers even for the 10 mins time it will take!)
>
> We're currently running the webapp deployed in tomcat 6.0.18.  Most of
> the documentation mentions jetty 6.x and we're just wondering if there
> is any advantage to either servlet container?
>
>
Really? Both Jetty and Tomcat have corresponding wiki pages for installation
instructions. Solr isn't really biased towards any of them.


> We are using tomcat due to familiarity, but there's no reason why we
> couldn't swap if there was a compelling reason.
>
>
No need to change. Both work great. If it is any consolation, we (AOL) use
Tomcat and I believe Lucid's certified Solr distribution also ships with
Tomcat. Jetty is just smaller and easier to embed and therefore Solr uses it
for testing and examples, though there are lots of shops which use the Jetty
shipped with Solr releases.

-- 
Regards,
Shalin Shekhar Mangar.


Re: How to use DataImportHandler with ExtractingRequestHandler?

2009-11-24 Thread Shalin Shekhar Mangar
On Fri, Nov 20, 2009 at 9:13 PM, javaxmlsoapdev  wrote:

>
> did you extend DIH to do this work? can you share code samples. I have
> similar requirement where I need tp index database records and each record
> has a column with document path so need to create another index for
> documents (we allow users to search both index separately) in parallel with
> reading some meta data of documents from database as well. I have all sorts
> of different document formats to index. fyi; I am on solr 1.4.0. Any
> pointers would be appreciated.
>
>
He did not extend DIH for this. He extracted out text from his documents and
saved them into files and used XPathEntityProcessor (you can use
PlainTextEntityProcessor) to index them.

I don't know much about ExtractionRequestHandler but if you want to use DIH,
you'll have to extend it to add Tika support. You may want to look at a
couple of open issues:

   1. https://issues.apache.org/jira/browse/SOLR-1358
   2. https://issues.apache.org/jira/browse/SOLR-1583

-- 
Regards,
Shalin Shekhar Mangar.


Re: Output all, from one field

2009-11-24 Thread Shalin Shekhar Mangar
On Tue, Nov 24, 2009 at 2:31 AM, Chris Hostetter
wrote:

>
> : Do you want to return just one field from all documents? If yes, you can:
> :
> :1. Query with q=*:*&fl=name
> :2. Use TermsComponent - http://wiki.apache.org/solr/TermsComponent
>
> note that those are very differnet creatures ... #1 gives you all of the
> stored values for every document.  #2 gives you all of the indexed terms
> (some of which may have all come from a single indexed value)
>
>
That is true but the OP did not specify which he wants. Thanks for the
clarification though, I forgot to specify that.

-- 
Regards,
Shalin Shekhar Mangar.


Tomcat vs Jetty for a solr instance?

2009-11-24 Thread Kevin Jackson
Hi,

We're running a high traffic (very high peak load) site with solr 1.3
(we can't upgrade to 1.4 just yet as we don't have capacity to remove
one of our servers even for the 10 mins time it will take!)

We're currently running the webapp deployed in tomcat 6.0.18.  Most of
the documentation mentions jetty 6.x and we're just wondering if there
is any advantage to either servlet container?

We are using tomcat due to familiarity, but there's no reason why we
couldn't swap if there was a compelling reason.

Thanks,
Kev


Normalizing multiple Chars with MappingCharFilter possible?

2009-11-24 Thread Andreas Kahl
Hello everyone,

is it possible to normalize Strings like '`e' (2 chars) => 'e' (in contrast to 
'é' (1 char) => 'e') with org.apache.lucene.analysis.MappingCharFilter?

I am asking this because I am considering to index some multilingual and 
multi-alphabetic data with Solr which uses such Strings as a substitution for 
'real' Unicode characters. 

Thanks for your advice. 

Andreas