Re: How to configure Solr in Glassfish ?

2009-07-21 Thread huenzhao

Yes, I don't know how set solr.home in glassfish with centOS.
I tried to configure the solr.home, but the error log is:looking for
solr.xml: /var/deploy/solr/solr.xml






markrmiller wrote:
 
 What have you tried? Deploying the Solr war should be pretty
 straightforward. The main issue is likely setting solr.home. You likely
 have
 a lot of options there though. You can set a system property in the
 startup
 script, set a system property in the webapp context xml (if you can locate
 it), or I think glassfish offers a GUI to set such things. There really
 shouldn't be much more to it than that, but you should try and see what
 you
 run into.
 I havn't tried out glassfish in a couple years now.
 
 -- 
 - Mark
 
 http://www.lucidimagination.com
 
 On Mon, Jul 20, 2009 at 8:27 AM, huenzhao huenz...@126.com wrote:
 

 I want use glassfish as the solr search server, but I don't know how to
 configure.
 Anybody knows?

 enzhao...@gmail.com
 Thanks!

 --
 View this message in context:
 http://www.nabble.com/How-to-configure-Solr--in-Glassfish---tp24565758p24565758.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 --
 
 

-- 
View this message in context: 
http://www.nabble.com/How-to-configure-Solr--in-Glassfish---tp24565758p24582232.html
Sent from the Solr - User mailing list archive at Nabble.com.



solr indexing on same set of records with different value of unique field...not working...

2009-07-21 Thread Noor

hi,

I need to run around 10 million records to index, by solr.
I has nearly 2lakh records, so i made a program to looping it till 10 
million.
Here, i specified 20 fields in schema.xml file. the unoque field i set 
was, currentTimeStamp field.
So, when i run the loader program (which loads xml data into solr) it 
creates currentTimestamp value...and loads into solr.


For this situation,
i stopped the loader program, after 100 records indexed into solr.
Then again, i run the loader program for the SAME 100 records to indexed 
means,

the solr results 100, rather than 200.

Because, i set currentTimeStamp field as uniqueField. So i expect the 
result as 200, if i run again the same 100 records...


Any suggestions please...

regards,
Noor




Linguistic variation support

2009-07-21 Thread prerna07

Hi,

I am implementing linguistic variations in solr search engine. I want to
implement this for US/UK/CA/AU english. 
e.g. Color (UK) = Colour (US)
when user searches for either of the word, both results should appear.

I don't want to use synonym.txt as this will make synonym.txt very long.

Please let me know how can we do this.

Thanks,
Prerna


-- 
View this message in context: 
http://www.nabble.com/Linguistic-variation-support-tp24583581p24583581.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Lemmatisation support in Solr

2009-07-21 Thread JCodina

I think that to get the best results you need some kind of natural language
processing 
I'm trying to do so using UIMA but i need to integrate it with SOLR as I
explain in this post
http://www.nabble.com/Solr-and-UIMA-tc24567504.html


prerna07 wrote:
 
 Hi,
 
 I am implementing Lemmatisation in Solr, which means if user looks for
 Mouse then it should display results of Mouse and Mice both. I 
 understand that this is something context search. I think of using synonym
 for 
 this but then synonyms.txt will be having so many records and this will
 keep on
 adding.
 
 Please suggest how I can implement it in some other way.
 
 Thanks,
 Prerna
 

-- 
View this message in context: 
http://www.nabble.com/Lemmatisation-support-in-Solr-tp24583655p24583841.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: method inform of SolrCoreAware callled 2 times

2009-07-21 Thread Marc Sturlese

I am with a nightly from middle june

Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 it is not normal to get the inform() called twice for a single object.
 which version of solr are you using?
 
 On Mon, Jul 20, 2009 at 7:17 PM, Marc Sturlesemarc.sturl...@gmail.com
 wrote:

 Hey there,
 I have implemented a custom component wich extends SearchComponent and
 implements SolrCoreAware.
 I have decalred it in solrconfig.xml as:
  searchComponent name=mycomp class=solr.MyCustomComponent 

 And added it in my Searchhandler as:
     arr name=last-components
       strmycomp/str
     /arr

 I am using multicore with two cores.
 I have noticed (doing some logging) that the method inform (the ones
 that
 implements SolrCoreAware) in being called 2 times per each core when I
 start
 my solr instance. As I understood SolrCoreAware inform method should be
 just called once per core, am I right or it's normal that is is called 2
 times per core?



 --
 View this message in context:
 http://www.nabble.com/method-inform-of-SolrCoreAware-callled-2-times-tp24570221p24570221.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 
 

-- 
View this message in context: 
http://www.nabble.com/method-inform-of-SolrCoreAware-callled-2-times-tp24570221p24584667.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: index version on slave

2009-07-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
on the slave this command would not work well. The indexversion is not
the actual index version. It is the current replicateable index
version.

why do you call that API directly?


On Tue, Jul 21, 2009 at 12:53 AM, solr jaysolr...@gmail.com wrote:
 If you ask for the index version of a slave instance, you always get version
 number being 0. Is it expected behavior?

 I am using this url

 http://slave_host:8983/solr/replication?command=indexversion

 This request returns correct version on master.

 If you use the 'details' command, you get the right version number (and
 generation number, and it gives more than what you want).

 Thanks,

 --
 J




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Highlight arbitrary text

2009-07-21 Thread Anders Melchiorsen
On Fri, 17 Jul 2009 16:04:24 +0200, Anders Melchiorsen
m...@cup.kalibalik.dk wrote:
 On Thu, 16 Jul 2009 10:56:38 -0400, Erik Hatcher
 e...@ehatchersolutions.com wrote:

 One trick worth noting is the FieldAnalysisRequestHandler can provide
 offsets from external text, which could be used for client-side
 highlighting (see the showmatch parameter too).

 Thanks. I tried doing this, and it almost works.

 However, in the normal highlighter, I am using usePhraseHighlighter and
 highlightMultiTerm and it seems that there is no way to turn these on in
 FieldAnalysisRequestHandler ?

In case these options are not available with the
FieldAnalysisRequestHandler,
would it be simple to implement them with a plugin? The highlightMultiTerm
is absolutely needed, as we use a lot of prefix searches.


Thanks,
Anders.



Re: Solr and UIMA

2009-07-21 Thread Grant Ingersoll


On Jul 20, 2009, at 6:43 AM, JCodina wrote:

D: Break things down. The CAS would only produce XML that solr can  
process.
Then different Tokenizers can be used to deal with the data in the  
CAS. the

main point is that the XML has a the doc and field labels of solr.


I just committed the DelimitedPayloadTokenFilterFactory, I suspect  
this is along the lines of what you are thinking, but I haven't done  
all that much with UIMA.


I also suspect the Tee/Sink capabilities of Lucene could be helpful,  
but they aren't available in Solr yet.



E: The set of capabilities to process the xml is defined in XML,  
similar to

lucas to define the ouput and in the solr schema to define how this is
processed.


I want to use it in order to index something that is common but I  
can't get

any tool to do that with sol: indexing a word and coding at the same
position the syntactic and semantic information. I know that in  
Lucene this
is evolving and it will be possible to include metadata but for the  
moment


What does Lucas do with Lucene?  Is it putting multiple tokens at the  
same position or using Payloads?


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: to index Ms-outlook(.Pst) files to solr tika

2009-07-21 Thread Grant Ingersoll
http://wiki.apache.org/solr/ExtractingRequestHandler contains several  
examples of posting files to Solr for Tika.


FYI, I don't know if PST files are supported by Tika.

-Grant

On Jul 21, 2009, at 4:38 AM, Brindha wrote:



Hi,
How to index Ms-outlook(.Pst) files to solr tika.I have posted the
Ms-outlook(.Pst) file directly to solr,the file also gets posted but  
with

empty content.
--
View this message in context: 
http://www.nabble.com/to-index-Ms-outlook%28.Pst%29-files-to-solr-tika-tp24583846p24583846.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Lemmatisation support in Solr

2009-07-21 Thread Grant Ingersoll
Sounds like you need a TokenFilter that does lemmatisation.  I don't  
know of any open ones off hand, but I haven't looked all that hard.


On Jul 21, 2009, at 4:25 AM, prerna07 wrote:



Hi,

I am implementing Lemmatisation in Solr, which means if user looks for
Mouse then it should display results of Mouse and Mice both. I
understand that this is something context search. I think of using  
synonym

for
this but then synonyms.txt will be having so many records and this  
will keep

on
adding.

Please suggest how I can implement it in some other way.

Thanks,
Prerna
--
View this message in context: 
http://www.nabble.com/Lemmatisation-support-in-Solr-tp24583655p24583655.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: index version on slave

2009-07-21 Thread solr jay
oh, in case of index data corrupted on slave, I want to download the entire
index from master. During downloading, I want the slave be out of service
and put it back after it finished. I was trying figure out how to determine
downloading is done. Right now, I am calling

http://slave_host:8983/solr/replication?command=detailshttp://slave_host:8983/solr/replication?command=indexversion

and compare the index version on slave and on master, and put the instance
back in service when this two are the same. It works fine except that the
response claims that the structure of the response may change.

Is this the right way to do it?

Thanks,


2009/7/21 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 on the slave this command would not work well. The indexversion is not
 the actual index version. It is the current replicateable index
 version.

 why do you call that API directly?


 On Tue, Jul 21, 2009 at 12:53 AM, solr jaysolr...@gmail.com wrote:
  If you ask for the index version of a slave instance, you always get
 version
  number being 0. Is it expected behavior?
 
  I am using this url
 
  http://slave_host:8983/solr/replication?command=indexversion
 
  This request returns correct version on master.
 
  If you use the 'details' command, you get the right version number (and
  generation number, and it gives more than what you want).
 
  Thanks,
 
  --
  J
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




-- 
J


Re: Solr and UIMA

2009-07-21 Thread JCodina

Hello, Grant,
there are two ways, to implement this, one is payloads, and the other one is
multiple tokens at the same positions.
Each of them can be useful, let me explain the way I thick they can be used.
Payloads : every token has extra information that can be used in the
processing , for example if I can add Part-of-speech then I can develop
tokenizers that take into account the POS (or for example I can generate
bigrams of Noum Adjective, or Noum prep Noum i can have a better stopwords
algorithm)

Multiple tokes in one position: If I can have  different tokens at the same
place, I can have different informations like: was #verb _be so I can do a
search for you _be #adjective to find all the sentences that talk about
you for example you were clever you are tall ..


I have not understood the way that theDelimitedPayloadTokenFilterFactory
may work in solr, which is the input format? 

so I was thinking in generating an xml where for each token a single string
is generated like was#verb#be
and then there is a tokenfilter that splits by # each white space separated
string,  in this case  in three words and adds the trailing character that
allows to search for the right semantic info. But gives them the same
increment. Of course the full processing chain must be aware of this.
But I must think on multiwords tokens  


Grant Ingersoll-6 wrote:
 
 
 On Jul 20, 2009, at 6:43 AM, JCodina wrote:
 
 D: Break things down. The CAS would only produce XML that solr can  
 process.
 Then different Tokenizers can be used to deal with the data in the  
 CAS. the
 main point is that the XML has  the doc and field labels of solr.
 
 I just committed the DelimitedPayloadTokenFilterFactory, I suspect  
 this is along the lines of what you are thinking, but I haven't done  
 all that much with UIMA.
 
 I also suspect the Tee/Sink capabilities of Lucene could be helpful,  
 but they aren't available in Solr yet.
 
 
 
 
 E: The set of capabilities to process the xml is defined in XML,  
 similar to
 lucas to define the ouput and in the solr schema to define how this is
 processed.


 I want to use it in order to index something that is common but I  
 can't get
 any tool to do that with sol: indexing a word and coding at the same
 position the syntactic and semantic information. I know that in  
 Lucene this
 is evolving and it will be possible to include metadata but for the  
 moment
 
 What does Lucas do with Lucene?  Is it putting multiple tokens at the  
 same position or using Payloads?
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-and-UIMA-tp24567504p24590509.html
Sent from the Solr - User mailing list archive at Nabble.com.



Synonyms.txt and index_synonyms.txt

2009-07-21 Thread Francis Yakin

Do you anyone the differences between these two?

From the schema.xml

We have:

fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

Do you know if we need both of them for search to be working?

Thanks

Francis



Re: Synonyms.txt and index_synonyms.txt

2009-07-21 Thread Otis Gospodnetic

Hi Francis,

The named of synonyms files are arbitrary, but whatever you call them needs to 
match what you have in solrconfig.xml
If you are referring to them, then they should probably exist.
If you are referring to them, then they should probably be non-empty.

But think this through a bit, because it seems like the index-time vs. 
query-time synonyms are still a bit fuzzy for you.  The Wiki has a good page on 
that.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Francis Yakin fya...@liquid.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Tuesday, July 21, 2009 1:50:43 PM
 Subject: Synonyms.txt and index_synonyms.txt
 
 
 Do you anyone the differences between these two?
 
 From the schema.xml
 
 We have:
 
 
   
 
 
 ignoreCase=true expand=true/
 
 words=stopwords.txt/
 
 generateNumberParts=1 catenateWords=1 catenateNumbers=1 
 catenateAll=0/
 
 
 protected=protwords.txt/
 
   
   
 
 
 ignoreCase=true expand=true/
 
 words=stopwords.txt/
 
 generateNumberParts=1 catenateWords=0 catenateNumbers=0 
 catenateAll=0/
 
 
 protected=protwords.txt/
 
   
 
 
 Do you know if we need both of them for search to be working?
 
 Thanks
 
 Francis



Storing string field in solr.ExternalFieldFile type

2009-07-21 Thread Jibo John

We're in the process of building a log searcher application.

In order to reduce the index size to improve the query performance,  
we're exploring the possibility of having:


 1. One field for each log line with 'indexed=true  stored=false'  
that will be used for searching
 2. Another field for each log line of type solr.ExternalFileField  
that will be used only for display purpose.


We realized that currently solr.ExternalFileField supports only float  
type.


Is there a way we can override this to support string type? Any issues  
with this approach?


Any ideas are welcome.


Thanks,
-Jibo




Re: Lemmatisation support in Solr

2009-07-21 Thread Benson Margulies
There are for-money solutions to this.

On Tue, Jul 21, 2009 at 10:04 AM, Grant Ingersollgsing...@apache.org wrote:
 Sounds like you need a TokenFilter that does lemmatisation.  I don't know of
 any open ones off hand, but I haven't looked all that hard.

 On Jul 21, 2009, at 4:25 AM, prerna07 wrote:


 Hi,

 I am implementing Lemmatisation in Solr, which means if user looks for
 Mouse then it should display results of Mouse and Mice both. I
 understand that this is something context search. I think of using synonym
 for
 this but then synonyms.txt will be having so many records and this will
 keep
 on
 adding.

 Please suggest how I can implement it in some other way.

 Thanks,
 Prerna
 --
 View this message in context:
 http://www.nabble.com/Lemmatisation-support-in-Solr-tp24583655p24583655.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
 Solr/Lucene:
 http://www.lucidimagination.com/search




Re: All in one index, or multiple indexes?

2009-07-21 Thread Jim Adams
It will depend on how much total volume you have.  If you are discussing
millions and millions of records, I'd say use multicore and shards.

On Wed, Jul 8, 2009 at 5:25 AM, Tim Sell trs...@gmail.com wrote:

 Hi,
 I am wondering if it is common to have just one very large index, or
 multiple smaller indexes specialized for different content types.

 We currently have multiple smaller indexes, although one of them is
 much larger then the others. We are considering merging them, to allow
 the convenience of searching across multiple types at once and get
 them back in one list. The largest of the current indexes has a couple
 of types that belong together, it has just one text field, and it is
 usually quite short and is similar to product names (words like The
 matter). Another index I would merge with this one, has multiple text
 fields (also quite short).

 We of course would still like to be able to get specific types. Is
 doing filtering on just one type a big performance hit compared to
 just querying it from it's own index? Bare in mind all these indexes
 run on the same machine. (we replicate them all to three machines and
 do load balancing).

 There are a number of considerations. From an application standpoint
 when querying across all types we may split the results out into the
 separate types anyway once we have the list back. If we always do
 this, is it silly to have them in one index, rather then query
 multiple indexes at once? Is multiple http requests less significant
 then the time to post split the results?

 In some ways it is easier to maintain a single index, although it has
 felt easier to optimize the results for the type of content if they
 are in separate indexes. My main concern of putting it all in one
 index is that we'll make it harder to work with. We will definitely
 want to do filtering on types sometimes, and if we go with a mashed up
 index I'd prefer not to maintain separate specialized indexes as well.

 Any thoughts?

 ~Tim.



solr 1.3.0 and Oracle Fusion Middleware

2009-07-21 Thread Hall, David
Trying to install SOLR for a project.  Currently we have a 10.1.3 Oracle J2EE 
install.  I believe it satisfies the SOLR requirements.   I have the war file 
deployed and it appears to be ½ working, but have errors with the .css file 
when hitting the admin page.

Anyone else been successful putting SOLR on Oracle's Java Containers and are 
there any pointers???

-Any help would be greatly appreciated.

Dave


FATAL: Solr returned an error: Invalid_Date_String

2009-07-21 Thread Mick England

Hi,

I have the following tag in my xml files:

field name=timestamp2009-05-06/field

When I try posting the file I get this error:

FATAL: Solr returned an error: Invalid_Date_String20090506

My schema.xml file has this:

   field name=timestamp type=date indexed=true stored=true
default=NOW multiValued=false/

How do I specify a correct date string?

-- 
View this message in context: 
http://www.nabble.com/FATAL%3A-Solr-returned-an-error%3A-Invalid_Date_String-tp24594686p24594686.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: FATAL: Solr returned an error: Invalid_Date_String

2009-07-21 Thread Andrew McCombe
Hi

Dates must be in ISO 8601 format:

http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

e.g 1995-12-31T23:59:59Z

Hope this helps

Andrew McCombe


2009/7/21 Mick England mic...@mac.com


 Hi,

 I have the following tag in my xml files:

 field name=timestamp2009-05-06/field

 When I try posting the file I get this error:

 FATAL: Solr returned an error: Invalid_Date_String20090506

 My schema.xml file has this:

   field name=timestamp type=date indexed=true stored=true
 default=NOW multiValued=false/

 How do I specify a correct date string?

 --
 View this message in context:
 http://www.nabble.com/FATAL%3A-Solr-returned-an-error%3A-Invalid_Date_String-tp24594686p24594686.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: FATAL: Solr returned an error: Invalid_Date_String

2009-07-21 Thread Mick England

Thanks for the quick response. That worked for me.


Andrew McCombe wrote:
 
 Dates must be in ISO 8601 format:
 
 http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html
 
 e.g 1995-12-31T23:59:59Z
 

-- 
View this message in context: 
http://www.nabble.com/FATAL%3A-Solr-returned-an-error%3A-Invalid_Date_String-tp24594686p24595148.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.3.0 and Oracle Fusion Middleware

2009-07-21 Thread Mark Miller
What are the errors you see?

On Tue, Jul 21, 2009 at 3:01 PM, Hall, David dh...@vermeer.com wrote:

 Trying to install SOLR for a project.  Currently we have a 10.1.3 Oracle
 J2EE install.  I believe it satisfies the SOLR requirements.   I have the
 war file deployed and it appears to be ½ working, but have errors with the
 .css file when hitting the admin page.

 Anyone else been successful putting SOLR on Oracle's Java Containers and
 are there any pointers???

 -Any help would be greatly appreciated.

 Dave




-- 
-- 
- Mark

http://www.lucidimagination.com


RE: solr 1.3.0 and Oracle Fusion Middleware

2009-07-21 Thread Hall, David
Jul 20, 2009 2:45:34 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.StackOverflowError
at java.util.Properties.getProperty(Properties.java:774)
at 
com.evermind.server.ApplicationServerSystemProperties.getProperty(ApplicationServerSystemProperties.java:43)
at java.lang.System.getProperty(System.java:629)
at sun.security.action.GetPropertyAction.run(GetPropertyAction.java:66)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.PrintWriter.init(PrintWriter.java:77)
at java.io.PrintWriter.init(PrintWriter.java:61)
at org.apache.solr.common.SolrException.toStr(SolrException.java:160)
at org.apache.solr.common.SolrException.log(SolrException.java:132)
at org.apache.solr.common.SolrException.logOnce(SolrException.java:150)
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:319)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:281)
... and the errors continues...

Any help appreciated.

Dave

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, July 21, 2009 4:55 PM
To: solr-user@lucene.apache.org
Subject: Re: solr 1.3.0 and Oracle Fusion Middleware

What are the errors you see?

On Tue, Jul 21, 2009 at 3:01 PM, Hall, David dh...@vermeer.com wrote:

 Trying to install SOLR for a project.  Currently we have a 10.1.3 Oracle
 J2EE install.  I believe it satisfies the SOLR requirements.   I have the
 war file deployed and it appears to be ½ working, but have errors with the
 .css file when hitting the admin page.

 Anyone else been successful putting SOLR on Oracle's Java Containers and
 are there any pointers???

 -Any help would be greatly appreciated.

 Dave




-- 
-- 
- Mark

http://www.lucidimagination.com


Random Slowness

2009-07-21 Thread Jeff Newburn
We are experiencing random slowness on certain queries.  I have been unable
to diagnose what the issue is.  We are using SOLR 1.4 and 99.99% of queries
return in under 250 ms.  The remaining queries are returning in 2-5 seconds
for no apparent reason.  There does not seem to be any commonality between
the queries.  This problem also includes admin system queries.  Any help or
direction would be much appreciated.

Specs:
Solr 1.4
Tomcat Server
4 cores
Largest core 155,000 documents.

Logs:
INFO: [zeta-main] webapp=null path=null params={command=details} status=0
QTime=1276 
INFO: [zeta-main] webapp=null path=null params={command=details} status=0
QTime=1144 
INFO: [zeta-main] webapp=null path=null params={command=details} status=0
QTime=1285 
INFO: [zeta-main] webapp=/solr path=/select
params={facet=truefacet.mincount=1facet.limit=-1wt=javabinrows=0facet.s
ort=truestart=0q=shoesfacet.field=colorFacetfacet.field=brandNameFacetf
acet.field=heelHeightfacet.field=attrFacet_Styleqt=dismaxfq=productTypeFa
cet:Shoesfq=gender:Womensfq=categoryFacet:Sandalsfq=width:EEfq=size:10.5
fq=priceFacet:$100.00+and+Underfq=personalityFacet:Sexy} hits=19
status=0 QTime=3689
INFO: [zeta-main] webapp=/solr path=/select
params={wt=javabinrows=100facet.sort=truestart=0q=shoesqt=dismaxfq=pro
ductTypeFacet:Shoesfq=gender:Womensfq=size:8fq=width:Dfq=brandNameFacet:
Dansko} hits=8 status=0 QTime=3566
INFO: [zeta-main] webapp=/solr path=/select
params={wt=javabinrows=100facet.sort=truestart=0q=shoesqt=dismaxfq=gen
der:Womensfq=productTypeFacet:Shoesfq=subCategoryFacet:Heelsfq=categoryFa
cet:Shoesfq=size:10} hits=5409 status=0 QTime=3348
INFO: [zeta-main] webapp=/solr path=/select
params={wt=javabinrows=100facet.sort=truestart=100q=shoesqt=dismaxfq=p
roductTypeFacet:Shoesfq=gender:Womensfq=personalityFacet:Dressfq=category
Facet:Shoesfq=size:10fq=heelHeight:Medium+(1+3/8in+-+2+1/2in)} hits=1129
status=0 QTime=3285
INFO: [zeta-main] webapp=/solr path=/select
params={wt=javabinrows=100facet.sort=truestart=200q=shoesqt=dismaxfq=p
roductTypeFacet:Shoesfq=gender:Womensfq=categoryFacet:Shoesfq=subCategory
Facet:Heelsfq=personalityFacet:Dressfq=attrFacet_Style:Pumpfq=size:5}
hits=644 status=0 QTime=3750
INFO: [6pm-main] webapp=/solr path=/select
params={wt=javabinrows=100facet.sort=truestart=0q=shoesqt=dismaxfq=exp
andedGender:Kidsfq=productTypeFacet:Shoesfq=gender:girlsfq=brandNameFacet
:UGG+Kids} hits=17 status=0 QTime=3789

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562



Re: expand synonyms without tokenizing stream?

2009-07-21 Thread Chris Hostetter

: I'd like to take keywords in my documents, and expand them as synonyms; for
: example, if the document gets annotated with a keyword of 'sf', I'd like
: that to expand to 'San Francisco'.  (San Francisco,San Fran,SF is a line in
: my synonyms.txt file).
: 
: But I also want to be able to display facets with counts for these keywords;
: I'd like them to be suitable for display.
...

: I've also done a copyfield to a 'KeywordsString' field, which is
: defined as string. i.e.
: 
: fieldType name=string class=solr.StrField sortMissingLast=true
: omitNorms=true/

It sounds like you are on the right track ... the key isearch on a field 
with synonyms expanded (at index time) and facet on a field with synonyms 
collapse (at index time)

try chagning the fieldtype you facet on to be a TextField with the 
KeywordTokenizer, and then use the SynonymFilter on it ... that should 
work (but i haven't tried it)

if you format your synonyms file properly (commas instead of arrows), you 
can use the exact smae file for both fieldtypes, even though one will 
expand, and the other will collapse.


-Hoss



Re: solr Analyzer help

2009-07-21 Thread Chris Hostetter

Any Lucene analyzer that has a no arg constructor can be used in Solr, 
just specify it by full class name (there is an example of this in the 
example schema.xml)

Any Tokenizer/TokenFilter that exists in the Lucene distribution also gets 
a Factory in Solr (unless someone forgets) you can use these Factories if 
you want to mix/match.

: I also see  stem filter factory  and  palin filtet factory for some
: languages like
: DutchStemFilterFactory,BrazilianStemFilterFactory.java
: GermanStemFilterFactory etc
: 
: and the plain filter  like ChineseFilterFactory.java
: 
: What is the stem filter factory  does it stem the words without including
: the snowball porter filter factory

They are factories for the corrisponding filters ... you should look at 
the docs for those Filters to understand what they do (the Factories are 
just simple, dumb APIs for generating instances of hte Filters when 
configured in hte schema.xml)



-Hoss



Re: solr 1.3.0 and Oracle Fusion Middleware

2009-07-21 Thread Mark Miller
Thanks. Check out this thread:
http://www.lucidimagination.com/search/document/b15c06f78820d1da/weblogic_10_compatibility_issue_stackoverflowerror
and this wikipage: http://wiki.apache.org/solr/SolrWeblogic

If it helps, please add to our wiki - if not, we can dig deeper.

Thanks,

-- 
- Mark

http://www.lucidimagination.com


On Tue, Jul 21, 2009 at 6:01 PM, Hall, David dh...@vermeer.com wrote:

 Jul 20, 2009 2:45:34 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.StackOverflowError
at java.util.Properties.getProperty(Properties.java:774)
at
 com.evermind.server.ApplicationServerSystemProperties.getProperty(ApplicationServerSystemProperties.java:43)
at java.lang.System.getProperty(System.java:629)
at
 sun.security.action.GetPropertyAction.run(GetPropertyAction.java:66)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.PrintWriter.init(PrintWriter.java:77)
at java.io.PrintWriter.init(PrintWriter.java:61)
at
 org.apache.solr.common.SolrException.toStr(SolrException.java:160)
at org.apache.solr.common.SolrException.log(SolrException.java:132)
at
 org.apache.solr.common.SolrException.logOnce(SolrException.java:150)
at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:319)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:281)
 ... and the errors continues...

 Any help appreciated.

 Dave

 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Tuesday, July 21, 2009 4:55 PM
 To: solr-user@lucene.apache.org
 Subject: Re: solr 1.3.0 and Oracle Fusion Middleware

 What are the errors you see?

 On Tue, Jul 21, 2009 at 3:01 PM, Hall, David dh...@vermeer.com wrote:

  Trying to install SOLR for a project.  Currently we have a 10.1.3 Oracle
  J2EE install.  I believe it satisfies the SOLR requirements.   I have the
  war file deployed and it appears to be ½ working, but have errors with
 the
  .css file when hitting the admin page.
 
  Anyone else been successful putting SOLR on Oracle's Java Containers and
  are there any pointers???
 
  -Any help would be greatly appreciated.
 
  Dave
 



 --
 --
 - Mark

 http://www.lucidimagination.com



Re: DutchStemFilterFactory reducing double vowels bug ?

2009-07-21 Thread Chris Hostetter

: Some time ago I configured my Solr instance to use the
: DutchStemFilterFactory.
...
: Words like 'baas', 'paas', 'maan', 'boom' etc. are indexed as 'bas',
: 'pas', 'man' and 'bom'. Those wordt have a meaning of their own. Am I
: missing something, or has this to be considered as a bug?

I know nothing about Dutch, but the DutchStemFilterFactory is just a 
factory for the DutchStemFilter, which is just a Lucene TOkenFilter 
arround the DutchStemmer which is a java impl of this algorithm...

http://snowball.tartarus.org/algorithms/dutch/stemmer.html

...according to that page, Step#4 explicilty includes a 
reduction of doubled vowels (maan-man is an explicit example)

so the code seems to be working as specified .. wether it's what you 
*want* is a different question.


-Hoss



Re: Deleting from SolrQueryResponse

2009-07-21 Thread Chris Hostetter

: Okay. So still, how would I go about creating a new DocList and Docset as
: they cannot be instantiated?

DocLists and DocSets are retrieved from the SolrIndexSearcher as results 
from searches.  a simple javadoc search for the useages of the DocList and 
DocSet APIs would have given you this answer.



-Hoss



Re: Regarding Response Builder

2009-07-21 Thread Chris Hostetter

: SolrParams params = req.getParams();
: 
: Now I want to get the values of those params. What should be the 
: approach as SolrParams is an abstract class and its get(String) method 
: is abstract?

your question seems to be more about java basics then about using Solr -- 
it doens't matter if SolrParams is abstract, 
any method (including req.getParams()) which says it returns an instance of 
SolrParams is required to do just that -- return an instance.  the 
SolrParams API contract garuntees that you can call get(String) on any 
instance.



-Hoss



Re: Solrj, tomcat and a proxy

2009-07-21 Thread Chris Hostetter

: Subject: Solrj, tomcat and a proxy
: References: 2aa3aff80907130547y124d433chec4f4bcbbfb35...@mail.gmail.com
: In-Reply-To: 2aa3aff80907130547y124d433chec4f4bcbbfb35...@mail.gmail.com

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking




-Hoss



Re: Merge Policy

2009-07-21 Thread Chris Hostetter

: SolrIndexConfig accepts a mergePolicy class name, however how does one
: inject properties into it?

At the moment you can't.  

If you look at the history of MergePolicy, users have never been 
encouraged to implement their own (the API actively discourages it, 
without going so far as to make it impossible).


-Hoss



Solr index as multiple separate index directories

2009-07-21 Thread Jason Rutherglen
I'd like to be able to define within a single Solr core, a set
of indexes in multiple directories. This is really useful for
indexing in Hadoop or integrating with Katta where an
EmbeddedSolrServer is distributed to the Hadoop cluster and
indexes are generated in parallel and returned to Solr slave
servers. It seems like this could be done using a custom
IndexReaderFactory that opens a MultiReader over the
directories. SolrIndexWriter usage in this context would be
limited to incremental updates (if anything).

It would be great for Solr docSet caching to operate at the
SegmentReader level so the small incremental updates don't cause
a massive cache regeneration. Maybe there's a way to trick Solr
into doing this today by using multiple EmbeddedSolrServer
instances for each large segment/shard, and executing a local
distributed query to them? This way each EmbeddedSolrServer
maintains caches that are not disturbed by shard updates.
Ideally if I had to use multiple cores, I'd rather not have
maintain separate instances of /conf on disk but could pass the
same in memory rep of solrconfig and schema into the core?


Re: lucene or Solr bug with dismax?

2009-07-21 Thread Chris Hostetter
: Indeed - I assumed that only the + and - characters had any
: special meaning when parsing dismax queries and that all other content
: would be treated just as keywords.  That seems to be how it's
: described in the dismax documentation?

The dirty little secret of hte dismax parser is that i was an idiot when i 
wrote it.

I was working on project that needed a parser that would support +/-, and 
wanted to try the DisjunctionMaxQuery expanstion of the terms that 
DisMaxParser now supports.  

I started by attempting to tackle the DisjunctionMaxQuery expantion in a 
subclass of the existing QueryParser with every intention of throwing it 
away once it was working. This was because i needed a quick proof of 
concept that demonstrated the dismax query structures produced were 
actually useful, so far i'd only tested a few hardcoded example queries, 
and i needed the parser support so i could run some regression tests over 
existing *WELL FORMED* queries to compare the relevancy results.

It worked great: i successfully demonstrated to the right people that the 
query structures made sense for all our use cases.  So then i put a lower 
priority item on my todo list / schedule to figure out the right way to 
implement a DisMaxParser so i wasn't stuck with any of the error code 
paths in the QueryParser superclass.

I'm not sure if i was really tired when i finally got to looking at it, or 
if i was just really distracted, but i distinctly remember testing queries 
that had and and or in them and seeing them get parsed the way i 
wanted: the words were treated as litterals and incorporated into the 
DisjunctionMaxQuery structure. So i guess i assumed something about how i 
subclassed QUeryParser was bypassing the normal and/or logic, and i 
decided by quick and dirty subclass would work well enough.

The key thing to note here is that i remember testing and and or ... 
not AND and OR ... for some reason or another i was totally brain dead 
and tested the wrong thing.  had i tested the right thing, i probably 
would have decided i needed to write a new parser from scratch, and had 
the time to work it into hte project schedule.

alas: 20/20 hindsight.



-Hoss



Re: Merge Policy

2009-07-21 Thread Jason Rutherglen
I am referring to setting properties on the *existing* policy
available in Lucene such as LogByteSizeMergePolicy.setMaxMergeMB

On Tue, Jul 21, 2009 at 5:11 PM, Chris
Hostetterhossman_luc...@fucit.org wrote:

 : SolrIndexConfig accepts a mergePolicy class name, however how does one
 : inject properties into it?

 At the moment you can't.

 If you look at the history of MergePolicy, users have never been
 encouraged to implement their own (the API actively discourages it,
 without going so far as to make it impossible).


 -Hoss




how to change the size of fieldValueCache in solr?

2009-07-21 Thread shb
The FieldValueCache plays a important role in sort and facet of solr. But
this cache is not managed by solr,
is there any way to configure it? thanks!


Re: how to change the size of fieldValueCache in solr?

2009-07-21 Thread Otis Gospodnetic

Hello,

You can control it in solrconfig.xml:

!-- Cache used to hold field values that are quickly accessible
 by document id.  The fieldValueCache is created by default
 even if not configured here.
--
  fieldValueCache
class=solr.FastLRUCache
size=512
autowarmCount=128
showItems=32
  /


Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: shb suh...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Wednesday, July 22, 2009 12:29:43 AM
 Subject: how to change the size of fieldValueCache in solr?
 
 The FieldValueCache plays a important role in sort and facet of solr. But
 this cache is not managed by solr,
 is there any way to configure it? thanks!



Re: how to change the size of fieldValueCache in solr?

2009-07-21 Thread shb
Thanks very much.  Is there any difference between fieldValueCache and
fieldCache?


Re: Regarding Response Builder

2009-07-21 Thread pof

I would just do something like this:

String myParam = req.getParams().get(xparam);

where xparam is:

http://localhost:8983/solr/select/?q=dogxparam=somethingstart=0rows=10indent=on


Kartik1 wrote:
 
 The responsebuiilder class has SolrQueryRequest as public type. Using
 SolrQueryRequest we can get a list of SolrParams like
 
 SolrParams params = req.getParams();
 
 Now I want to get the values of those params. What should be the approach
 as SolrParams is an abstract class and its get(String) method is abstract?
 
 Best regards,
 Amandeep Singh
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Regarding-Response-Builder-tp24456722p24600481.html
Sent from the Solr - User mailing list archive at Nabble.com.