Re: replication -- missing field data file

2010-01-11 Thread Shalin Shekhar Mangar
On Thu, Jan 7, 2010 at 9:34 PM, Giovanni Fernandez-Kincade 
gfernandez-kinc...@capitaliq.com wrote:

 Right, but if you want to take periodic backups and ship them to tape or
 some DR site, you need to be able to tell when the backup is actually
 complete.

 It's seems very strange to me that you can actually track the replication
 progress on a slave, but you can't track the backup progress on a master.


You are right. This can be improved. See
https://issues.apache.org/jira/browse/SOLR-1714

-- 
Regards,
Shalin Shekhar Mangar.


Re: Adaptive search?

2010-01-11 Thread Shalin Shekhar Mangar
On Fri, Jan 8, 2010 at 3:41 AM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:


 - Original Message 

  From: Shalin Shekhar Mangar shalinman...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Wed, December 23, 2009 2:45:21 AM
  Subject: Re: Adaptive search?
 
  On Wed, Dec 23, 2009 at 4:09 AM, Lance Norskog wrote:
 
   Nice!
  
   Siddhant: Another problem to watch out for is the feedback problem:
   someone clicks on a link and it automatically becomes more
   interesting, so someone else clicks, and it gets even more
   interesting... So you need some kind of suppression. For example, as
   individual clicks get older, you can push them down. Or you can put a
   cap on the number of clicks used to rank the query.
  
  
  We use clicks/views instead of just clicks to avoid this problem.

 Doesn't a click imply a view?  You click to view.  I must be missing
 something...


I was talking about boosting documents using past popularity. So a user
searches for X and gets 10 results. This view is recorded for each of the 10
documents and added to the index later. If a user clicks on result #2, the
click is recorded for doc #2 and added to index. We boost using clicks/view.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Understanding the query parser

2010-01-11 Thread rswart

I am running in to the same issue. I have tried to replace my
WhitespaceTokenizerFactory with a PatternTokenizerFactory with pattern
(\s+|-) but I still seem to get a phrase query. Why is that?




Ahmet Arslan wrote:
 
 
 I am using Solr 1.3.
 I have an index with a field called name. It is of type
 text
 (unmodified, stock text field from solr).
 
 My query
 field:foo-bar
 is parsed as a phrase query
 field:foo bar
 
 I was rather expecting it to be parsed as
 field:(foo bar)
 or
 field:foo field:bar
 
 Is there an expectation mismatch? Can I make it work as I
 expect it to?
 
 If the query analyzer produces two or more tokens from a single token,
 QueryParser constructs PhraseQuery. Therefore it is expected. 
 
 Without writing custom code it seems impossible to alter this behavior.
 
 Modifying QueryParser to change this behavior will be troublesome. 
 I think easiest way is to replace '-' with whitespace before analysis
 phase. Probably in client side. Or in an custom RequestHandler.
 
 May be you can set qp.setPhraseSlop(Integer.MAX_VALUE); so that 
 field:foo-bar and field:(foo AND bar) will be virtually equal.
 
 hope this helps.
 
 
   
 
 

-- 
View this message in context: 
http://old.nabble.com/Understanding-the-query-parser-tp27071483p27107523.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Synonyms from Database

2010-01-11 Thread Peter A. Kirk
You could try to take the code for SynonymFilterFactory as a starting point, 
and adapt it to obtain the synonym configuration from another source than a 
text file.

But I'm not sure what you mean by checking for synonyms at query time. As I 
understand it, Solr works like that anyway - depending on how you configure it. 
The only difference between your new SynonymFilterFactory and Solr's default 
would be where it obtains the synonym configuration from.

You can get Solr to re-read the configuration by issuing a reload command. 
See http://wiki.apache.org/solr/CoreAdmin#RELOAD.

Med venlig hilsen / Best regards

Peter Kirk
E-mail: mailto:p...@alpha-solutions.dk


-Original Message-
From: Ravi Gidwani [mailto:ravi.gidw...@gmail.com] 
Sent: 10. januar 2010 16:20
To: solr-user@lucene.apache.org
Subject: Synonyms from Database

Hi :
 Is there any work done in providing synonyms from a database instead of
synonyms.txt file ? Idea is to have a dictionary in DB that can be enhanced
on the fly in the application. This can then be used at query time to check
for synonyms.

I know I am not putting thoughts to the performance implications of this
approach, but will love to hear about others thoughts.

~Ravi.

No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.725 / Virus Database: 270.14.133/2612 - Release Date: 01/11/10 
08:35:00


Re: Synonyms from Database

2010-01-11 Thread Ravi Gidwani
Thanks all for your replies.

I guess what I meant by Query time, and as I understand solr  (and I may be
wrong here) I can add synonyms.txt in the query analyser as follows:

  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
   
 /analyzer

By this my understanding is , even if the document (at index time) has a
word mathematics and my synonyms.txt file has:

mathematics=math,maths,

a query for math will match mathematics. Since we have the synonyms.txt
in the query analyzer. So I was curious about the database approach on
similar lines.

I get the point of the performance, and I think that is a big NO NO for this
approach. But the idea was to allow changing the synonyms on the fly (more
like adaptive synonyms) and improve the hits.

I guess the only way (as Otis suggested) is to rewrite the file and reload
configuration (as Peter suggested). This might be a performance hit (rewrite
the file) and reload, but I guess still much better than the reading from DB
?

Thanks again for your comments.

~Ravi.


2010/1/10 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 On Sun, Jan 10, 2010 at 1:04 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
  Ravi,
 
  I think if your synonyms were in a DB, it would be trivial to
 periodically dump them into a text file Solr expects.  You wouldn't want to
 hit the DB to look up synonyms at query time...
 Why query time. Can it not be done at startup time ?
 
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
 
 
 
  - Original Message 
  From: Ravi Gidwani ravi.gidw...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Sat, January 9, 2010 10:20:18 PM
  Subject: Synonyms from Database
 
  Hi :
   Is there any work done in providing synonyms from a database
 instead of
  synonyms.txt file ? Idea is to have a dictionary in DB that can be
 enhanced
  on the fly in the application. This can then be used at query time to
 check
  for synonyms.
 
  I know I am not putting thoughts to the performance implications of this
  approach, but will love to hear about others thoughts.
 
  ~Ravi.
 
 



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com



Re: Adaptive search?

2010-01-11 Thread Ravi Gidwani
Shalin:
   Can you point me to pages/resources that talk about this approach
in details ? OR can you provide more details on the schema and the
function(?) used for ranking the documents.

Thanks,
~Ravi.

On Mon, Jan 11, 2010 at 1:00 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Fri, Jan 8, 2010 at 3:41 AM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com
  wrote:

 
  - Original Message 
 
   From: Shalin Shekhar Mangar shalinman...@gmail.com
   To: solr-user@lucene.apache.org
   Sent: Wed, December 23, 2009 2:45:21 AM
   Subject: Re: Adaptive search?
  
   On Wed, Dec 23, 2009 at 4:09 AM, Lance Norskog wrote:
  
Nice!
   
Siddhant: Another problem to watch out for is the feedback problem:
someone clicks on a link and it automatically becomes more
interesting, so someone else clicks, and it gets even more
interesting... So you need some kind of suppression. For example, as
individual clicks get older, you can push them down. Or you can put a
cap on the number of clicks used to rank the query.
   
   
   We use clicks/views instead of just clicks to avoid this problem.
 
  Doesn't a click imply a view?  You click to view.  I must be missing
  something...
 
 
 I was talking about boosting documents using past popularity. So a user
 searches for X and gets 10 results. This view is recorded for each of the
 10
 documents and added to the index later. If a user clicks on result #2, the
 click is recorded for doc #2 and added to index. We boost using
 clicks/view.

 --
 Regards,
 Shalin Shekhar Mangar.



RE: Synonyms from Database

2010-01-11 Thread Peter A. Kirk
Hi - I don't think you'll see a performance hit using a DB for your synonym 
configuration as opposed to a text file. 

The configuration is only done once (at startup) - or when you reload. You 
won't be reloading every minute, will you? After reading the configuration, the 
synonyms are available to Solr via the SynonymFilter object (at least as I 
understand it from looking at the code).

The reload feature actually sounds quite neat - it will reload in the 
background, and switch in the newly read configuration when it's ready - so 
hopefully no down-time waiting for configuration.

Med venlig hilsen / Best regards

Peter Kirk
E-mail: mailto:p...@alpha-solutions.dk


-Original Message-
From: Ravi Gidwani [mailto:ravi.gidw...@gmail.com] 
Sent: 11. januar 2010 22:43
To: solr-user@lucene.apache.org; noble.p...@gmail.com
Subject: Re: Synonyms from Database

Thanks all for your replies.

I guess what I meant by Query time, and as I understand solr  (and I may be
wrong here) I can add synonyms.txt in the query analyser as follows:

  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
   
 /analyzer

By this my understanding is , even if the document (at index time) has a
word mathematics and my synonyms.txt file has:

mathematics=math,maths,

a query for math will match mathematics. Since we have the synonyms.txt
in the query analyzer. So I was curious about the database approach on
similar lines.

I get the point of the performance, and I think that is a big NO NO for this
approach. But the idea was to allow changing the synonyms on the fly (more
like adaptive synonyms) and improve the hits.

I guess the only way (as Otis suggested) is to rewrite the file and reload
configuration (as Peter suggested). This might be a performance hit (rewrite
the file) and reload, but I guess still much better than the reading from DB
?

Thanks again for your comments.

~Ravi.


2010/1/10 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 On Sun, Jan 10, 2010 at 1:04 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
  Ravi,
 
  I think if your synonyms were in a DB, it would be trivial to
 periodically dump them into a text file Solr expects.  You wouldn't want to
 hit the DB to look up synonyms at query time...
 Why query time. Can it not be done at startup time ?
 
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
 
 
 
  - Original Message 
  From: Ravi Gidwani ravi.gidw...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Sat, January 9, 2010 10:20:18 PM
  Subject: Synonyms from Database
 
  Hi :
   Is there any work done in providing synonyms from a database
 instead of
  synonyms.txt file ? Idea is to have a dictionary in DB that can be
 enhanced
  on the fly in the application. This can then be used at query time to
 check
  for synonyms.
 
  I know I am not putting thoughts to the performance implications of this
  approach, but will love to hear about others thoughts.
 
  ~Ravi.
 
 



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com


No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.725 / Virus Database: 270.14.133/2612 - Release Date: 01/11/10 
08:35:00


Re: Synonyms from Database

2010-01-11 Thread Erik Hatcher


On Jan 11, 2010, at 4:51 AM, Peter A. Kirk wrote:
The reload feature actually sounds quite neat - it will reload in  
the background, and switch in the newly read configuration when  
it's ready - so hopefully no down-time waiting for configuration.


Correct me if I'm wrong, but I don't think that it's true about a  
reload working in the background.  While a core is reloading (and  
warming), it is unavailable for search.  right?  I think you have to  
create a new core, and then swap to keep things alive constantly.


Erik



Re: Synonyms from Database

2010-01-11 Thread Shalin Shekhar Mangar
On Mon, Jan 11, 2010 at 4:15 PM, Erik Hatcher erik.hatc...@gmail.comwrote:


 On Jan 11, 2010, at 4:51 AM, Peter A. Kirk wrote:

 The reload feature actually sounds quite neat - it will reload in the
 background, and switch in the newly read configuration when it's ready -
 so hopefully no down-time waiting for configuration.


 Correct me if I'm wrong, but I don't think that it's true about a reload
 working in the background.  While a core is reloading (and warming), it is
 unavailable for search.  right?  I think you have to create a new core, and
 then swap to keep things alive constantly.


Core reload swaps the old core with a new core on the same configuration
files with no downtime. See CoreContainer#reload.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Understanding the query parser

2010-01-11 Thread Ahmet Arslan

 I am running in to the same issue. I have tried to replace
 my
 WhitespaceTokenizerFactory with a PatternTokenizerFactory
 with pattern
 (\s+|-) but I still seem to get a phrase query. Why is
 that?

It is in the source code of QueryParser's getFieldQuery(String field, String 
queryText)  method line#660. If numTokens  1 it returns Phrase Query. 

Modifications in analysis phase (CharFilterFactory, TokenizerFactory, 
TokenFilterFactory) won't change this behavior. Something must be done before 
analysis phase.

But i think in your case, you can obtain match with modifying parameters of 
WordDelimeterFilterFactory even with PhraseQuery.


  


Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-11 Thread MitchK

Hello Hossman,

sorry for my late response.

For this specific case, you are right. It makes more sense to do such work
on the fly.
However, I am only testing at the moment, what one can do with Solr and what
not.

Is the UpdateProcessor something that comes froms Lucene itself or from
Solr?

Thanks!


hossman wrote:
 
 
 : Is there a way to prepare a document the described way with Lucene/Solr,
 : before I analyze it?
 : My use case is to categorize several documents in an automatic way,
 which
 : includes that I have to create data from the given input doing some
 : information retrieval.
 
 As Ryan mentioned earlier: this is what the UpdateRequestProcessor API 
 is for -- it allows you to modify Documents (regardless of how they were 
 added: csv, xml, dih) prior to Solr processing them...
 
 http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-to27026739.html
 
 Personally, i think you may be looking at your problem from the wrong 
 dirrection...
 
 :  Imagine you would analyze, index and store them like you normally do
 and
 :  afterwards you want to set, whether the document belongs to the
 expensive
 :  item-group or not.
 :  If the price for the item is higher than 500$, it belongs to the
 :  expensive
 :  ones, otherwise not.
 
 ...for a situation like that, i wouldn't attempt to classify the docs as 
 expensive or cheap when adding them.  instead i would use numeric 
 ranges for faceting and filtering to show me how many docs where 
 expensive or cheap at query time -- that way when the ecomony tanks i 
 can redifine my definition of expensive on the fly w/o needing to 
 reindex a million documents.
 
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27109760.html
Sent from the Solr - User mailing list archive at Nabble.com.



Multi language support

2010-01-11 Thread Daniel Persson
Hi Solr users.

I'm trying to set up a site with Solr search integrated. And I use the
SolJava API to feed the index with search documents. At the moment I
have only activated search on the English portion of the site. I'm
interested in using as many features of solr as possible. Synonyms,
Stopwords and stems all sounds quite interesting and useful but how do
I set up this in a good way for a multilingual site?

The site don't have a huge text mass so performance issues don't
really bother me but still I'd like to hear your suggestions before I
try to implement an solution.

Best regards

Daniel


Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-11 Thread Erik Hatcher


On Jan 11, 2010, at 7:33 AM, MitchK wrote:
Is the UpdateProcessor something that comes froms Lucene itself or  
from

Solr?


It's at the Solr level - http://lucene.apache.org/solr/api/org/apache/solr/update/processor/UpdateRequestProcessor.html 



Erik



Re: Synonyms from Database

2010-01-11 Thread Erik Hatcher


On Jan 11, 2010, at 5:50 AM, Shalin Shekhar Mangar wrote:

On Mon, Jan 11, 2010 at 4:15 PM, Erik Hatcher  
erik.hatc...@gmail.comwrote:




On Jan 11, 2010, at 4:51 AM, Peter A. Kirk wrote:

The reload feature actually sounds quite neat - it will reload in  
the
background, and switch in the newly read configuration when  
it's ready -

so hopefully no down-time waiting for configuration.



Correct me if I'm wrong, [me saying something wrong]


Core reload swaps the old core with a new core on the same  
configuration

files with no downtime. See CoreContainer#reload.


Sweet!  Thanks for the correction.

Erik



Re: Could not start SOLR issue

2010-01-11 Thread Grant Ingersoll

On Jan 11, 2010, at 1:38 AM, dipti khullar wrote:

 Hi
 
 We are running master/slave Solr 1.3 version on production since about 5
 months.
 
 Yesterday, we faced following issue on one of the slaves for the first time
 because of which we had to restart the slave.
 
 SEVERE: Could not start SOLR. Check solr/home property
 java.lang.RuntimeException: java.io.FileNotFoundException: no segments* file
 found in 
 org.apache.lucene.store.FSDirectory@/opt/solr/solr_slave/solr/data/index:
 files: null

It looks like your index was removed out from under you.  Perhaps this is due 
to the failed snapshot install?

Can you replicate the problem?  Stopping the slave and deleting the index 
directory and then restarting it should resolve it for now.

 
 I searched on forums but couldn't find any relevant info which could have
 possibly caused the issue.
 
 In snapinstaller logs, following failed logs were observed:
 
 2010/01/11 04:20:06 started by solr
 2010/01/11 04:20:06 command:
 /opt/solr/solr_slave/solr/solr/bin/snapinstaller
 2010/01/11 04:20:07 installing snapshot
 /opt/solr/solr_slave/solr/data/snapshot.20100111041402
 2010/01/11 04:20:07 notifing Solr to open a new Searcher
 2010/01/11 04:20:07 failed to connect to Solr server
 2010/01/11 04:20:07 snapshot installed but Solr server has not open a new
 Searcher
 2010/01/11 04:20:08 failed (elapsed time: 1 sec)
 
 
 Configurations:
 There are 2 search servers in a virtualized VMware environment. Each has  2
 instances of Solr running on separates ports in tomcat.
 Server 1: hosts 1 master(application 1), 1 slave (application 1)
 Server 2: hosta 1 master (application 2), 1 slave (application 1)
 
 Both servers have 4 CPUs and 4 GB RAM.
 Master
 - 4GB RAM
 - 1GB JVM Heap memory is allocated to Solr
 Slave1/Slave2:
 - 4GB RAM
 - 2GB JVM Heap memory is allocated to Solr
 
 Can there be any possible reasons that solr/home property couldn't be found?
 
 Thanks
 Dipti



Re: Could not start SOLR issue

2010-01-11 Thread dipti khullar
We were able to resolve the problem by restarting the slave. Also these
failed snapshot install incidents occur after the exception was observed,
which seems logically correct also.
Could not start SOLR. Check solr/home property

We just want to avoid such instances for future. Is it possible that an any
instance of time solr/home property can get corrupted?

One more thing we observed was that tomcat-users.xml was overwritten. Should
we debug towards that also?

Thanks
Dipti

On Mon, Jan 11, 2010 at 6:55 PM, Grant Ingersoll gsing...@apache.orgwrote:


 On Jan 11, 2010, at 1:38 AM, dipti khullar wrote:

  Hi
 
  We are running master/slave Solr 1.3 version on production since about 5
  months.
 
  Yesterday, we faced following issue on one of the slaves for the first
 time
  because of which we had to restart the slave.
 
  SEVERE: Could not start SOLR. Check solr/home property
  java.lang.RuntimeException: java.io.FileNotFoundException: no segments*
 file
  found in org.apache.lucene.store.FSDirectory@
 /opt/solr/solr_slave/solr/data/index:
  files: null

 It looks like your index was removed out from under you.  Perhaps this is
 due to the failed snapshot install?

 Can you replicate the problem?  Stopping the slave and deleting the index
 directory and then restarting it should resolve it for now.

 
  I searched on forums but couldn't find any relevant info which could have
  possibly caused the issue.
 
  In snapinstaller logs, following failed logs were observed:
 
  2010/01/11 04:20:06 started by solr
  2010/01/11 04:20:06 command:
  /opt/solr/solr_slave/solr/solr/bin/snapinstaller
  2010/01/11 04:20:07 installing snapshot
  /opt/solr/solr_slave/solr/data/snapshot.20100111041402
  2010/01/11 04:20:07 notifing Solr to open a new Searcher
  2010/01/11 04:20:07 failed to connect to Solr server
  2010/01/11 04:20:07 snapshot installed but Solr server has not open a new
  Searcher
  2010/01/11 04:20:08 failed (elapsed time: 1 sec)
 
 
  Configurations:
  There are 2 search servers in a virtualized VMware environment. Each has
  2
  instances of Solr running on separates ports in tomcat.
  Server 1: hosts 1 master(application 1), 1 slave (application 1)
  Server 2: hosta 1 master (application 2), 1 slave (application 1)
 
  Both servers have 4 CPUs and 4 GB RAM.
  Master
  - 4GB RAM
  - 1GB JVM Heap memory is allocated to Solr
  Slave1/Slave2:
  - 4GB RAM
  - 2GB JVM Heap memory is allocated to Solr
 
  Can there be any possible reasons that solr/home property couldn't be
 found?
 
  Thanks
  Dipti




update solr index

2010-01-11 Thread Marc Des Garets
Hi,

I am running solr in tomcat and I have about 35 indexes (between 2 and
80 millions documents each). Currently if I try to update few documents
from an index (let's say the one which contains 80 millions documents)
while tomcat is running and therefore receiving requests, I am getting
few very long garbage collection (about 60sec). I am running tomcat with
-Xms10g -Xmx10g -Xmn2g -XX:PermSize=256m -XX:MaxPermSize=256m. I'm using
ConcMarkSweepGC.

I have 2 questions:
1. Is solr doing something specific while an index is being updated like
updating something in memory which would cause the garbage collection?

2. Any idea how I could solve this problem? Currently I stop tomcat,
update index, start tomcat. I would like to be able to update my index
while tomcat is running. I was thinking about running more tomcat
instance with less memory for each and each running few of my indexes.
Do you think it would be the best way to go?


Thanks,
Marc
--
This transmission is strictly confidential, possibly legally privileged, and 
intended solely for the 
addressee.  Any views or opinions expressed within it are those of the author 
and do not necessarily 
represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary 
companies.  If you 
are not the intended recipient then you must not disclose, copy or take any 
action in reliance of this 
transmission. If you have received this transmission in error, please notify 
the sender as soon as 
possible.  No employee or agent is authorised to conclude any binding agreement 
on behalf of 
i-CD Publishing (UK) Ltd with another party by email without express written 
confirmation by an 
authorised employee of the Company. http://www.192.com (Tel: 08000 192 192).  
i-CD Publishing (UK) Ltd 
is incorporated in England and Wales, company number 3148549, VAT No. GB 
673128728.

Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-11 Thread MitchK

Is there any schemata that explains which class is responsible for which
level of processing my data to the index?

My example was: I have categorized, whether something is cheap or expensive.  
Let's say I didn't do that on the fly, but with the help of the
UpdateRequestProcessor.
Imagine there is a query like harry potter dvd-collection cheap or cheap
Harry Potter dvd-collection. 
How can I customize, that, if there is something said about the category
cheap, Solr uses a facetting query on cat:cheap? To do so, I have to
alter the original query - how can I do that?
 

Erik Hatcher-4 wrote:
 
 
 On Jan 11, 2010, at 7:33 AM, MitchK wrote:
 Is the UpdateProcessor something that comes froms Lucene itself or  
 from
 Solr?
 
 It's at the Solr level -
 http://lucene.apache.org/solr/api/org/apache/solr/update/processor/UpdateRequestProcessor.html
  
  
 
   Erik
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27111504.html
Sent from the Solr - User mailing list archive at Nabble.com.



How to display Highlight with VelocityResponseWriter?

2010-01-11 Thread qiuyan . xu

Hi,

we need a web gui for solr and we've noticed that  
VelocityResponseWriter is integrated in solr-proj for that purpose.  
But i have no idea how i can configure solrconfig.xml so that snippet  
with highlight can also be displayed in the web gui. I've added bool  
name=hltrue/bool into the standard responseHandler and it already  
works, i.e without velocity. But the same line doesn't take effect in  
itas. Should i configure anything else? Thanks in advance.


with best regards,
Qiuyan
?xml version=1.0 encoding=UTF-8 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

config
  !-- Set this to 'false' if you want solr to continue working after it has 
   encountered an severe configuration error.  In a production environment, 
   you may want solr to keep working even if one handler is mis-configured.

   You may also set this to false using by setting the system property:
 -Dsolr.abortOnConfigurationError=false
 --
  abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError

  !-- Used to specify an alternate directory to hold all index data
   other than the default ./data under the Solr home.
   If replication is in use, this should match the replication configuration. --
  dataDir${solr.data.dir:./solr/data}/dataDir


  indexDefaults
   !-- Values here affect all index writers and act as a default unless overridden. --
useCompoundFilefalse/useCompoundFile

mergeFactor10/mergeFactor
!--
 If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first.

 --
!--maxBufferedDocs1000/maxBufferedDocs--
!-- Tell Lucene when to flush documents to disk.
Giving Lucene more memory for indexing means faster indexing at the cost of more RAM

If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first.

--
ramBufferSizeMB32/ramBufferSizeMB
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout

!--
 Expert: Turn on Lucene's auto commit capability.
 This causes intermediate segment flushes to write a new lucene
 index descriptor, enabling it to be opened by an external
 IndexReader.
 NOTE: Despite the name, this value does not have any relation to Solr's autoCommit functionality
 --
!--luceneAutoCommitfalse/luceneAutoCommit--
!--
 Expert:
 The Merge Policy in Lucene controls how merging is handled by Lucene.  The default in 2.3 is the LogByteSizeMergePolicy, previous
 versions used LogDocMergePolicy.

 LogByteSizeMergePolicy chooses segments to merge based on their size.  The Lucene 2.2 default, LogDocMergePolicy chose when
 to merge based on number of documents

 Other implementations of MergePolicy must have a no-argument constructor
 --
!--mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy/mergePolicy--

!--
 Expert:
 The Merge Scheduler in Lucene controls how merges are performed.  The ConcurrentMergeScheduler (Lucene 2.3 default)
  can perform merges in the background using separate threads.  The SerialMergeScheduler (Lucene 2.2 default) does not.
 --
!--mergeSchedulerorg.apache.lucene.index.ConcurrentMergeScheduler/mergeScheduler--

!--
  This option specifies which Lucene LockFactory implementation to use.
  
  single = SingleInstanceLockFactory - suggested for a read-only index
   or when there is no possibility of another process trying
   to modify the index.
  native = NativeFSLockFactory
  simple = SimpleFSLockFactory

  (For backwards compatibility with Solr 1.2, 'simple' is the default
   if not specified.)
--
lockTypesingle/lockType
  /indexDefaults

  mainIndex
!-- options specific to the main on-disk lucene index --
useCompoundFilefalse/useCompoundFile
ramBufferSizeMB32/ramBufferSizeMB
mergeFactor10/mergeFactor
!-- Deprecated --
!--maxBufferedDocs1000/maxBufferedDocs--
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength

!-- If true, unlock any held write or 

Re: Getting solr response data in a JS query

2010-01-11 Thread Gregg Hoshovsky
You might be running into  an Ajax restriction.

See if an article like this helps.


http://www.nathanm.com/ajax-bypassing-xmlhttprequest-cross-domain-restriction/


On 1/9/10 11:37 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote:

Dan,

You didn't mention whether you tried wt=json .  Does it work if you use that 
to tell Solr to return its response in JSON format?

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



- Original Message 
 From: Dan Yamins dyam...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sat, January 9, 2010 10:05:54 PM
 Subject: Getting solr response data in a JS query

 Hi:

 I'm trying to use figure out how to get solr responses and use them in my
 website.I'm having some problems figure out how to

 1) My initial thought is is to use ajax, and insert a line like this in my
 script:

  data = eval($.get(http://localhost:8983/solr/select/?q=*:*
 ).responseText)

 ... and then do what I want with the data, with logic being done in
 Javascript on the front page.

 However, this is just not working technically:  no matter what alternative I
 use, I always seem to get no response to this query.  I think I'm having
 exactly the same problem as described here:

 http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html%20http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html

 and here:

 http://stackoverflow.com/questions/1906498/solr-responses-to-webbrowser-url-but-not-from-javascript-code

 Just like those two OPs, I can definitely access my solr responese through a
 web browser, but my jquery is getting nothing.Unfortunately, in neither
 thread did the answer seem to have been figured out satisfactorily.   Does
 anybody know what the problem is?


 2)  As an alternative, I _can_ use  the ajax-solr library.   Code like this:

 var Manager;
 (function ($) {
   $(function () {
 Manager = new AjaxSolr.Manager({
   solrUrl: 'http://localhost:8983/solr/'
});

   Manager.init();
   Manager.store.addByValue('q', '*:*');
   Manager.store.addByValue('rows', '1000');
   Manager.doRequest();
   });
 })(jQuery);

 does indeed load solr data into my DOM.Somehow, ajax-solr's doRequest
 method is doing something that makes it possible to receive the proper
 response from the solr servlet, but I don't know what it is so I can't
 replicate it with my own ajax.   Does anyone know what is happening?

 (Of course, I _could_ just use ajax-solr, but doing so would mean figuring
 out how to re-write my existing application for how to display search
 results in a form that works with the ajax-solr api, and I' d rather avoid
 this if possible since it looks somewhat nontrivial.)


 Thanks!
 Dan




Re: Getting solr response data in a JS query

2010-01-11 Thread Matt Mitchell
I remember having a difficult time getting jquery to work as I thought it
would. Something to do with the wt. I ended up creating a little client lib.
Maybe this will be useful in finding your problem?

example:
  http://github.com/mwmitchell/get_rest/blob/master/solr_example.html
lib:
  http://github.com/mwmitchell/get_rest/blob/master/solr_client.jquery.js

Matt

On Mon, Jan 11, 2010 at 11:22 AM, Gregg Hoshovsky hosho...@ohsu.edu wrote:

 You might be running into  an Ajax restriction.

 See if an article like this helps.



 http://www.nathanm.com/ajax-bypassing-xmlhttprequest-cross-domain-restriction/


 On 1/9/10 11:37 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote:

 Dan,

 You didn't mention whether you tried wt=json .  Does it work if you use
 that to tell Solr to return its response in JSON format?

  Otis
 --
 Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



 - Original Message 
  From: Dan Yamins dyam...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Sat, January 9, 2010 10:05:54 PM
  Subject: Getting solr response data in a JS query
 
  Hi:
 
  I'm trying to use figure out how to get solr responses and use them in my
  website.I'm having some problems figure out how to
 
  1) My initial thought is is to use ajax, and insert a line like this in
 my
  script:
 
   data = eval($.get(http://localhost:8983/solr/select/?q=*:*
  ).responseText)
 
  ... and then do what I want with the data, with logic being done in
  Javascript on the front page.
 
  However, this is just not working technically:  no matter what
 alternative I
  use, I always seem to get no response to this query.  I think I'm having
  exactly the same problem as described here:
 
  http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html
 %20http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html
 
  and here:
 
 
 http://stackoverflow.com/questions/1906498/solr-responses-to-webbrowser-url-but-not-from-javascript-code
 
  Just like those two OPs, I can definitely access my solr responese
 through a
  web browser, but my jquery is getting nothing.Unfortunately, in
 neither
  thread did the answer seem to have been figured out satisfactorily.
 Does
  anybody know what the problem is?
 
 
  2)  As an alternative, I _can_ use  the ajax-solr library.   Code like
 this:
 
  var Manager;
  (function ($) {
$(function () {
  Manager = new AjaxSolr.Manager({
solrUrl: 'http://localhost:8983/solr/'
 });
 
Manager.init();
Manager.store.addByValue('q', '*:*');
Manager.store.addByValue('rows', '1000');
Manager.doRequest();
});
  })(jQuery);
 
  does indeed load solr data into my DOM.Somehow, ajax-solr's doRequest
  method is doing something that makes it possible to receive the proper
  response from the solr servlet, but I don't know what it is so I can't
  replicate it with my own ajax.   Does anyone know what is happening?
 
  (Of course, I _could_ just use ajax-solr, but doing so would mean
 figuring
  out how to re-write my existing application for how to display search
  results in a form that works with the ajax-solr api, and I' d rather
 avoid
  this if possible since it looks somewhat nontrivial.)
 
 
  Thanks!
  Dan





Re: How to display Highlight with VelocityResponseWriter?

2010-01-11 Thread Sascha Szott

Qiuyan,


with highlight can also be displayed in the web gui. I've added bool
name=hltrue/bool into the standard responseHandler and it already
works, i.e without velocity. But the same line doesn't take effect in
itas. Should i configure anything else? Thanks in advance.
First of all, just a few notes on the /itas request handler in your 
solrconfig.xml:


1. The entry

arr name=components
  strhighlight/str
/arr

is obsolete, since the highlighting component is a default search 
component [1].


2. Note that since you didn't specify a value for hl.fl highlighting 
will only affect the fields listed inside of qf.


3. Why did you override the default value of hl.fragmenter? In most 
cases the default fragmenting algorithm (gap) works fine - and maybe in 
yours as well?



To make sure all your hl related settings are correct, can you post an 
xml output (change the wt parameter to xml) for a search with 
highlighted results.


And finally, can you post the vtl code snippet that should produce the 
highlighted output.


-Sascha

[1] http://wiki.apache.org/solr/SearchComponent








Re: Multi language support

2010-01-11 Thread Markus Jelsma
Hello,


We have implemented language specific search in Solr using language
specific fields and field types. For instance, an en_text field type can
use an English stemmer, and list of stopwords and synonyms. We, however
did not use specific stopwords, instead we used one list shared by both
languages.

So you would have a field type like:
fieldType name=en_text class=solr.TextField ...
 analyzer type=
  filter class=solr.StopFilterFactory words=stopwords.en.txt
  filter class=solr.SynonymFilterFactory synonyms=synoyms.en.txt

etc etc.



Cheers,

-  
Markus Jelsma  Buyways B.V.
Technisch ArchitectFriesestraatweg 215c
http://www.buyways.nl  9743 AD Groningen   


Alg. 050-853 6600  KvK  01074105
Tel. 050-853 6620  Fax. 050-3118124
Mob. 06-5025 8350  In: http://www.linkedin.com/in/markus17


On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote:

 Hi Solr users.
 
 I'm trying to set up a site with Solr search integrated. And I use the
 SolJava API to feed the index with search documents. At the moment I
 have only activated search on the English portion of the site. I'm
 interested in using as many features of solr as possible. Synonyms,
 Stopwords and stems all sounds quite interesting and useful but how do
 I set up this in a good way for a multilingual site?
 
 The site don't have a huge text mass so performance issues don't
 really bother me but still I'd like to hear your suggestions before I
 try to implement an solution.
 
 Best regards
 
 Daniel


Replication problem

2010-01-11 Thread Jason Rutherglen
Hi, sorry for the somewhat inane question:

I setup replication request handler on the master however I'm not
seeing any replicatable indexes via
http://localhost:8080/solr/main/replication?command=indexversion
Queries such as *:* yield results on the master (so I assume the
commit worked).  The replication console shows an index, so not sure
what's going on.  Here's the request handler XML on the master:

requestHandler name=/replication class=solr.ReplicationHandler 
lst name=master
   str name=enabletrue/str
   !--Replicate on 'optimize'. Other values can be 'commit',
'startup'. It is possible to have multiple entries o$
   str name=replicateAftercommit,optimize/str

   !--Create a backup after 'optimize'. Other values can be
'commit', 'startup'. It is possible to have multiple $
   !-- str name=backupAfteroptimize/str --

   !--If configuration files need to be replicated give the names
here, separated by comma --
   str 
name=confFilesschema.xml,synonyms.txt,stopwords.txt,elevate.xml/str
   !--The default value of reservation is 10 secs.See the
documentation below . Normally , you should not need to$
   str name=commitReserveDuration00:10:00/str
/lst
  /requestHandler


Re: Replication problem

2010-01-11 Thread Yonik Seeley
Did you try adding startup to the list of events to replicate after?

-Yonik
http://www.lucidimagination.com

On Mon, Jan 11, 2010 at 12:25 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 Hi, sorry for the somewhat inane question:

 I setup replication request handler on the master however I'm not
 seeing any replicatable indexes via
 http://localhost:8080/solr/main/replication?command=indexversion
 Queries such as *:* yield results on the master (so I assume the
 commit worked).  The replication console shows an index, so not sure
 what's going on.  Here's the request handler XML on the master:

 requestHandler name=/replication class=solr.ReplicationHandler 
    lst name=master
       str name=enabletrue/str
       !--Replicate on 'optimize'. Other values can be 'commit',
 'startup'. It is possible to have multiple entries o$
       str name=replicateAftercommit,optimize/str

       !--Create a backup after 'optimize'. Other values can be
 'commit', 'startup'. It is possible to have multiple $
       !-- str name=backupAfteroptimize/str --

       !--If configuration files need to be replicated give the names
 here, separated by comma --
       str 
 name=confFilesschema.xml,synonyms.txt,stopwords.txt,elevate.xml/str
       !--The default value of reservation is 10 secs.See the
 documentation below . Normally , you should not need to$
       str name=commitReserveDuration00:10:00/str
    /lst
  /requestHandler



Re: Replication problem

2010-01-11 Thread Jason Rutherglen
Yonik,

I added startup to replicateAfter, however no dice... There's no
errors the Tomcat log.

The output of:
http://localhost-master:8080/solr/main/replication?command=indexversion
response
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
long name=indexversion0/long
long name=generation0/long
/response

The master replication UI:
Local Index  Index Version: 1263182366335, Generation: 3
Location: /mnt/solr/main/data/index
Size: 1.08 KB

Master solrconfig.xml, and tomcat was restarted:

requestHandler name=/replication class=solr.ReplicationHandler 
lst name=master
   str name=enabletrue/str
   !--Replicate on 'optimize'. Other values can be 'commit',
'startup'. It is possible to have multiple entries o$
   str name=replicateAfterstartup,commit,optimize/str

   !--Create a backup after 'optimize'. Other values can be
'commit', 'startup'. It is possible to have multiple $
   !-- str name=backupAfteroptimize/str --

   !--If configuration files need to be replicated give the names
here, separated by comma --
   str 
name=confFilesschema.xml,synonyms.txt,stopwords.txt,elevate.xml/str
   !--The default value of reservation is 10 secs.See the
documentation below . Normally , you should not need to$
   str name=commitReserveDuration00:10:00/str
/lst
  /requestHandler


On Tue, Jan 12, 2010 at 11:29 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 Did you try adding startup to the list of events to replicate after?

 -Yonik
 http://www.lucidimagination.com

 On Mon, Jan 11, 2010 at 12:25 PM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 Hi, sorry for the somewhat inane question:

 I setup replication request handler on the master however I'm not
 seeing any replicatable indexes via
 http://localhost:8080/solr/main/replication?command=indexversion
 Queries such as *:* yield results on the master (so I assume the
 commit worked).  The replication console shows an index, so not sure
 what's going on.  Here's the request handler XML on the master:

 requestHandler name=/replication class=solr.ReplicationHandler 
    lst name=master
       str name=enabletrue/str
       !--Replicate on 'optimize'. Other values can be 'commit',
 'startup'. It is possible to have multiple entries o$
       str name=replicateAftercommit,optimize/str

       !--Create a backup after 'optimize'. Other values can be
 'commit', 'startup'. It is possible to have multiple $
       !-- str name=backupAfteroptimize/str --

       !--If configuration files need to be replicated give the names
 here, separated by comma --
       str 
 name=confFilesschema.xml,synonyms.txt,stopwords.txt,elevate.xml/str
       !--The default value of reservation is 10 secs.See the
 documentation below . Normally , you should not need to$
       str name=commitReserveDuration00:10:00/str
    /lst
  /requestHandler




help implementing a couple of business rules

2010-01-11 Thread Joe Calderon
hello *, im looking for help on writing queries to implement a few
business rules.


1. given a set of fields how to return matches that match across them
but not just one specific one, ex im using a dismax parser currently
but i want to exclude any results that only match against a field
called 'description2'


2. given a set of fields how to return matches that match across them
but on one specific field match as a phrase only, ex im using a dismax
parser currently but i want matches against a field called 'people' to
only match as a phrase


thx much,

--joe


Re: help implementing a couple of business rules

2010-01-11 Thread Erik Hatcher


On Jan 11, 2010, at 12:56 PM, Joe Calderon wrote:

1. given a set of fields how to return matches that match across them
but not just one specific one, ex im using a dismax parser currently
but i want to exclude any results that only match against a field
called 'description2'


One way could be to add an fq parameter to the request:

   fq=-description2:(query)


2. given a set of fields how to return matches that match across them
but on one specific field match as a phrase only, ex im using a dismax
parser currently but i want matches against a field called 'people' to
only match as a phrase


Doesn't setting pf=people accomplish this?

Erik



Re: help implementing a couple of business rules

2010-01-11 Thread Joe Calderon
thx, but im not sure that covers all edge cases, to clarify
1. matching description2 is okay if other fields are matched too, but
results matching only to description2 should be omitted

2. its okay to not match against the people field, but matches against
the people field should only be phrase matches

sorry if  i was unclear

--joe
On Mon, Jan 11, 2010 at 10:13 AM, Erik Hatcher erik.hatc...@gmail.com wrote:

 On Jan 11, 2010, at 12:56 PM, Joe Calderon wrote:

 1. given a set of fields how to return matches that match across them
 but not just one specific one, ex im using a dismax parser currently
 but i want to exclude any results that only match against a field
 called 'description2'

 One way could be to add an fq parameter to the request:

   fq=-description2:(query)

 2. given a set of fields how to return matches that match across them
 but on one specific field match as a phrase only, ex im using a dismax
 parser currently but i want matches against a field called 'people' to
 only match as a phrase

 Doesn't setting pf=people accomplish this?

        Erik




Re: Understanding the query parser

2010-01-11 Thread Avlesh Singh

 It is in the source code of QueryParser's getFieldQuery(String field,
 String queryText)  method line#660. If numTokens  1 it returns Phrase
 Query.

That's exactly the question. Would be nice to hear from someone as to why is
it that way?

Cheers
Avlesh

On Mon, Jan 11, 2010 at 5:10 PM, Ahmet Arslan iori...@yahoo.com wrote:


  I am running in to the same issue. I have tried to replace
  my
  WhitespaceTokenizerFactory with a PatternTokenizerFactory
  with pattern
  (\s+|-) but I still seem to get a phrase query. Why is
  that?

 It is in the source code of QueryParser's getFieldQuery(String field,
 String queryText)  method line#660. If numTokens  1 it returns Phrase
 Query.

 Modifications in analysis phase (CharFilterFactory, TokenizerFactory,
 TokenFilterFactory) won't change this behavior. Something must be done
 before analysis phase.

 But i think in your case, you can obtain match with modifying parameters of
 WordDelimeterFilterFactory even with PhraseQuery.






Cores + Replication Config

2010-01-11 Thread Giovanni Fernandez-Kincade
If you want to share one config amidst master  slaves, using Solr 1.4 
replication, is there a way to specific whether a core is Master or Slave when 
using the CREATE Core command?

Thanks,
Gio.


Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2010-01-11 Thread darniz

Thanks we were having the saem issue.
We are trying to store article content and we are strong a field like
pThis article is for blah /p.
Wheni see the analysis.jsp page it does strip out the p tags and is
indexed. but when we fetch the document it returns the field with the p
tags.
From solr point of view, its correct but our issue is that this kind of html
tags is screwing up our display of our page. Is there an easy way to esure
how to strip out hte html tags, or do we have to take care of manually.

Thanks
Rashid


aseem cheema wrote:
 
 Alright. It turns out that escapedTags is not for what I thought it is
 for.
 The problem that I am having with HTMLStripCharFilterFactory is that
 it strips the html while indexing the field, but not while storing the
 field. That is why what is see in analysis.jsp, which is index
 analysis, does not match what gets stored... because.. well HTML is
 stripped only for indexing. Makes so much sense.
 
 Thanks to Ryan McKinley for clarifying this.
 Aseem
 
 On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema aseemche...@gmail.com
 wrote:
 I am trying to post a document with the following content using SolrJ:
 centercontent/center
 I need the xml/html tags to be ignored. Even though this works fine in
 analysis.jsp, this does not work with SolrJ, as the client escapes the
  and  with lt; and gt; and HTMLStripCharFilterFactory does not
 strip those escaped tags. How can I achieve this? Any ideas will be
 highly appreciated.

 There is escapedTags in HTMLStripCharFilterFactory constructor. Is
 there a way to get that to work?
 Thanks
 --
 Aseem

 
 
 
 -- 
 Aseem
 
 

-- 
View this message in context: 
http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Tokenizer question

2010-01-11 Thread Grant Ingersoll
What do your FieldTypes look like for the fields in question?

On Jan 10, 2010, at 10:05 AM, rswart wrote:

 
 Hi,
 
 This is probably an easy question. 
 
 I am doing a simple query on postcode and house number. If the housenumber
 contains a minus sign like:
 
 q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)
 
 the resulting parsed query contains a phrase query:
 
 +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:39 43)
 
 This never matches.
 
 What I want solr to do is generate the following parsed query (essentially
 an OR for both house numbers):
 
 +(PostCode:1078 PostCode:pw) +(HouseNumber:39 HouseNumber:43)
 
 Solr generates this based on the following query (so a space instead of a
 minus sign):
 
 q=PostCode:(1078 pw)+AND+HouseNumber:(39 43)
 
 
 I tried two things to have Solr generate the desired parsed query:
 
 1. WordDelimiterFilterFactory with generateNumberParts=1 but this results in
 a phrase query
 2. PatternTokenizerFactory that splits on (\s+|-).
 
 But both options don't work. 
 
 Any suggestions on how to get rid of the phrase query?
 
 Thanks,
 
 Richard
 -- 
 View this message in context: 
 http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: Tokenizer question

2010-01-11 Thread Grant Ingersoll
And also, what query parser are you using? 
On Jan 11, 2010, at 2:46 PM, Grant Ingersoll wrote:

 What do your FieldTypes look like for the fields in question?
 
 On Jan 10, 2010, at 10:05 AM, rswart wrote:
 
 
 Hi,
 
 This is probably an easy question. 
 
 I am doing a simple query on postcode and house number. If the housenumber
 contains a minus sign like:
 
 q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)
 
 the resulting parsed query contains a phrase query:
 
 +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:39 43)
 
 This never matches.
 
 What I want solr to do is generate the following parsed query (essentially
 an OR for both house numbers):
 
 +(PostCode:1078 PostCode:pw) +(HouseNumber:39 HouseNumber:43)
 
 Solr generates this based on the following query (so a space instead of a
 minus sign):
 
 q=PostCode:(1078 pw)+AND+HouseNumber:(39 43)
 
 
 I tried two things to have Solr generate the desired parsed query:
 
 1. WordDelimiterFilterFactory with generateNumberParts=1 but this results in
 a phrase query
 2. PatternTokenizerFactory that splits on (\s+|-).
 
 But both options don't work. 
 
 Any suggestions on how to get rid of the phrase query?
 
 Thanks,
 
 Richard
 -- 
 View this message in context: 
 http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem using Solr/Lucene: 
 http://www.lucidimagination.com/search
 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2010-01-11 Thread Erick Erickson
This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersshows you many
of the SOLR analyzers and filters. Would one of
the various *HTMLStrip* stuff work?

HTH
ERick

On Mon, Jan 11, 2010 at 2:44 PM, darniz rnizamud...@edmunds.com wrote:


 Thanks we were having the saem issue.
 We are trying to store article content and we are strong a field like
 pThis article is for blah /p.
 Wheni see the analysis.jsp page it does strip out the p tags and is
 indexed. but when we fetch the document it returns the field with the p
 tags.
 From solr point of view, its correct but our issue is that this kind of
 html
 tags is screwing up our display of our page. Is there an easy way to esure
 how to strip out hte html tags, or do we have to take care of manually.

 Thanks
 Rashid


 aseem cheema wrote:
 
  Alright. It turns out that escapedTags is not for what I thought it is
  for.
  The problem that I am having with HTMLStripCharFilterFactory is that
  it strips the html while indexing the field, but not while storing the
  field. That is why what is see in analysis.jsp, which is index
  analysis, does not match what gets stored... because.. well HTML is
  stripped only for indexing. Makes so much sense.
 
  Thanks to Ryan McKinley for clarifying this.
  Aseem
 
  On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema aseemche...@gmail.com
  wrote:
  I am trying to post a document with the following content using SolrJ:
  centercontent/center
  I need the xml/html tags to be ignored. Even though this works fine in
  analysis.jsp, this does not work with SolrJ, as the client escapes the
   and  with lt; and gt; and HTMLStripCharFilterFactory does not
  strip those escaped tags. How can I achieve this? Any ideas will be
  highly appreciated.
 
  There is escapedTags in HTMLStripCharFilterFactory constructor. Is
  there a way to get that to work?
  Thanks
  --
  Aseem
 
 
 
 
  --
  Aseem
 
 

 --
 View this message in context:
 http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2010-01-11 Thread darniz

Well thats the whole discussion we are talking about.
I had the impression that the html tags are filtered and then the field is
stored without tags. But looks like the html tags are removed and terms are
indexed purely for indexing, and the actual text is stored in raw format.

Lets say for example if i enter a field like 
field name=bodyphonda car road review/field
When i do analysis on the body field the html filter removes the p tag and
indexed works honda, car, road, review. But when i fetch body field to
display in my document it returns phonda car road review

I hope i make sense.
thanks
darniz



Erick Erickson wrote:
 
 This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersshows you
 many
 of the SOLR analyzers and filters. Would one of
 the various *HTMLStrip* stuff work?
 
 HTH
 ERick
 
 On Mon, Jan 11, 2010 at 2:44 PM, darniz rnizamud...@edmunds.com wrote:
 

 Thanks we were having the saem issue.
 We are trying to store article content and we are strong a field like
 pThis article is for blah /p.
 Wheni see the analysis.jsp page it does strip out the p tags and is
 indexed. but when we fetch the document it returns the field with the p
 tags.
 From solr point of view, its correct but our issue is that this kind of
 html
 tags is screwing up our display of our page. Is there an easy way to
 esure
 how to strip out hte html tags, or do we have to take care of manually.

 Thanks
 Rashid


 aseem cheema wrote:
 
  Alright. It turns out that escapedTags is not for what I thought it is
  for.
  The problem that I am having with HTMLStripCharFilterFactory is that
  it strips the html while indexing the field, but not while storing the
  field. That is why what is see in analysis.jsp, which is index
  analysis, does not match what gets stored... because.. well HTML is
  stripped only for indexing. Makes so much sense.
 
  Thanks to Ryan McKinley for clarifying this.
  Aseem
 
  On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema aseemche...@gmail.com
  wrote:
  I am trying to post a document with the following content using SolrJ:
  centercontent/center
  I need the xml/html tags to be ignored. Even though this works fine in
  analysis.jsp, this does not work with SolrJ, as the client escapes the
   and  with lt; and gt; and HTMLStripCharFilterFactory does not
  strip those escaped tags. How can I achieve this? Any ideas will be
  highly appreciated.
 
  There is escapedTags in HTMLStripCharFilterFactory constructor. Is
  there a way to get that to work?
  Thanks
  --
  Aseem
 
 
 
 
  --
  Aseem
 
 

 --
 View this message in context:
 http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116601.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Adaptive search?

2010-01-11 Thread Chris Hostetter

: I was talking about boosting documents using past popularity. So a user
: searches for X and gets 10 results. This view is recorded for each of the 10
: documents and added to the index later. If a user clicks on result #2, the
: click is recorded for doc #2 and added to index. We boost using clicks/view.

FWIW: I've observed three problems with this type of metric...

1) render vs view ... what you are calling a view is really a 
rendering -- you are sending the data back to include the item in the 
list of 10 items on the page, and the brwoser is rendering it, but that 
doesn't mean the users is actaully viewing it -- particularly in a 
webpage type situation where only the first 3-5 results might actually 
appear above the fold and the user has to scroll to see the rest.  Even 
in a smaller UI element (like a left or right nav info box, there's no 
garuntee that the user acctually views any of the items, which can bias 
things.

2) It doesn't take into account people who click on a result, decide it's 
terrible, hit the back arrow and click on a differnet result -- both of 
those wind up scoring equally.  Some really complex session+click 
analysis can overcome this, but not a lot of people have the resources to 
do that all the time.

3) ignoring #1 and #2 above (because i havne't found many better options) 
you face the popularity problem -- or what my coworkers and i use to call 
the TRL Problem back in the 90s:  MTV's Total Request Live was a Top X 
countdown show of videos, featuring hte most popular videos of the week 
based on requests -- but it was also the number one show on the network, 
occupying something like 4/24 broadcast hours of every day, when there was 
only a total of 6/24 hours that actaully showed music videoes.  So for 
them ost part the only videos peopel ever saw were on TRL, so those were 
the only videos that ever got requested.

In a nutshell: once something becomes popular and is what everybody 
sees, it stays popular, because it's what everybody sees and they don't 
know that there is better stuff out there.

Even if everyone looks at the full list of results and actaully reads all 
of the first 10 summaries, in the absense of ay other bias their 
inclination is going to be to assume #1 is the best.  So they might click 
on that even if another result on the list appears better bassed on their 
opinion.

A variation that i did some experiments with, but never really refined 
because i didn't have the time/energy to really go to town on it, is to 
weight the clicks based on position:  a click on item #1 whould't be 
worth anything -- it's hte number one result, the expectation is that it 
better get clicked or something is wrong.  A click on #2 is worth 
soemthing to that item, and a click on #3 is worth more to that item, and 
so on ... so that if the #9 item gets a click, that's huge.  To do it 
right, I think what you really want to do is penalize items that get views 
but no clicks -- because if someone loads up resuolts 1-10, and doesn't 
click on any of them, that should be a vote in favor of moving all of them 
down and moving item #11 up (even though it got no views or clicks)

But like i said: i never experimented with this idea enough to come up 
with a good formula, or verify that the idea was sound.

-Hoss



Re: Getting solr response data in a JS query

2010-01-11 Thread James McKinney

AJAX Solr does more or less the following:

jQuery.getJSON('http://localhost:8983/solr/select/?q=*:*wt=jsonjson.wrf=?',
{}, function (data) {
// do something with data, which is the eval'd JSON response
});
-- 
View this message in context: 
http://old.nabble.com/Getting-solr-response-data-in-a-JS-query-tp27095224p27116970.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2010-01-11 Thread Chris Hostetter

: stored without tags. But looks like the html tags are removed and terms are
: indexed purely for indexing, and the actual text is stored in raw format.

Correct. Analysis is all about indexing it has nothing to do with 
stored content.

You can write UpdateProcessors that modify the content before it is either 
indexed or stored, but there aren't a lot of Processors provided out of 
hte box at the moment.

-Hoss



Re: Tokenizer question

2010-01-11 Thread rswart

We are using the standard query parser (so no dismax).

Fieldtype is solr.TextField with the following query analyzer:

analyzer type=query
tokenizer class=solr.PatternTokenizerFactory pattern=(\s+|-) / 
  filter class=solr.StopFilterFactory
words=../../../synonyms/nl_stopwords.txt ignoreCase=true/
filter class=solr.SynonymFilterFactory

synonyms=../../../synonyms/nl_synonyms.txt ignoreCase=true
expand=true /
filter class=solr.PatternReplaceFilterFactory
pattern=- replacement=  
replace=all /
filter 
class=com.foo.IgnoreListWordDelimiterFilterFactory
generateWordParts=1 
generateNumberParts=1 catenateWords=1
catenateNumbers=0 catenateAll=0 
preserveOriginal=0
splitOnCaseChange=0 ignoreList=@amp;/
filter class=solr.PatternReplaceFilterFactory
pattern=^0+(.) replacement=$1 
replace=all /
filter class=solr.LowerCaseFilterFactory /
filter 
class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer




Grant Ingersoll-6 wrote:
 
 And also, what query parser are you using? 
 On Jan 11, 2010, at 2:46 PM, Grant Ingersoll wrote:
 
 What do your FieldTypes look like for the fields in question?
 
 On Jan 10, 2010, at 10:05 AM, rswart wrote:
 
 
 Hi,
 
 This is probably an easy question. 
 
 I am doing a simple query on postcode and house number. If the
 housenumber
 contains a minus sign like:
 
 q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)
 
 the resulting parsed query contains a phrase query:
 
 +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:39 43)
 
 This never matches.
 
 What I want solr to do is generate the following parsed query
 (essentially
 an OR for both house numbers):
 
 +(PostCode:1078 PostCode:pw) +(HouseNumber:39 HouseNumber:43)
 
 Solr generates this based on the following query (so a space instead of
 a
 minus sign):
 
 q=PostCode:(1078 pw)+AND+HouseNumber:(39 43)
 
 
 I tried two things to have Solr generate the desired parsed query:
 
 1. WordDelimiterFilterFactory with generateNumberParts=1 but this
 results in
 a phrase query
 2. PatternTokenizerFactory that splits on (\s+|-).
 
 But both options don't work. 
 
 Any suggestions on how to get rid of the phrase query?
 
 Thanks,
 
 Richard
 -- 
 View this message in context:
 http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Tokenizer-question-tp27099119p27117036.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2010-01-11 Thread Erick Erickson
Ah, I read your post too fast and ignored the title. Sorry 'bout that.

Erick

On Mon, Jan 11, 2010 at 2:55 PM, darniz rnizamud...@edmunds.com wrote:


 Well thats the whole discussion we are talking about.
 I had the impression that the html tags are filtered and then the field is
 stored without tags. But looks like the html tags are removed and terms are
 indexed purely for indexing, and the actual text is stored in raw format.

 Lets say for example if i enter a field like
 field name=bodyphonda car road review/field
 When i do analysis on the body field the html filter removes the p tag
 and
 indexed works honda, car, road, review. But when i fetch body field to
 display in my document it returns phonda car road review

 I hope i make sense.
 thanks
 darniz



 Erick Erickson wrote:
 
  This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersshows you
  many
  of the SOLR analyzers and filters. Would one of
  the various *HTMLStrip* stuff work?
 
  HTH
  ERick
 
  On Mon, Jan 11, 2010 at 2:44 PM, darniz rnizamud...@edmunds.com wrote:
 
 
  Thanks we were having the saem issue.
  We are trying to store article content and we are strong a field like
  pThis article is for blah /p.
  Wheni see the analysis.jsp page it does strip out the p tags and is
  indexed. but when we fetch the document it returns the field with the
 p
  tags.
  From solr point of view, its correct but our issue is that this kind of
  html
  tags is screwing up our display of our page. Is there an easy way to
  esure
  how to strip out hte html tags, or do we have to take care of manually.
 
  Thanks
  Rashid
 
 
  aseem cheema wrote:
  
   Alright. It turns out that escapedTags is not for what I thought it is
   for.
   The problem that I am having with HTMLStripCharFilterFactory is that
   it strips the html while indexing the field, but not while storing the
   field. That is why what is see in analysis.jsp, which is index
   analysis, does not match what gets stored... because.. well HTML is
   stripped only for indexing. Makes so much sense.
  
   Thanks to Ryan McKinley for clarifying this.
   Aseem
  
   On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema aseemche...@gmail.com
   wrote:
   I am trying to post a document with the following content using
 SolrJ:
   centercontent/center
   I need the xml/html tags to be ignored. Even though this works fine
 in
   analysis.jsp, this does not work with SolrJ, as the client escapes
 the
and  with lt; and gt; and HTMLStripCharFilterFactory does not
   strip those escaped tags. How can I achieve this? Any ideas will be
   highly appreciated.
  
   There is escapedTags in HTMLStripCharFilterFactory constructor. Is
   there a way to get that to work?
   Thanks
   --
   Aseem
  
  
  
  
   --
   Aseem
  
  
 
  --
  View this message in context:
 
 http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116601.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Search query log using solr

2010-01-11 Thread Chris Hostetter

: application. I am planning to add a search query log that will capture all
: the search queries (and more information like IP,user info,date time,etc).
: I understand I can easily do this on the application side capturing all the
: search request, logging them in a DB/File before sending them to solr for
: execution.
:  But I wanted to check with the forum if there was any better
: approach OR best practices OR anything that has been added to Solr for such
: requirement.

doing this in your applicatyion is probably the best bet ... you could put 
all of the extra info in query args to solr, which would be ignored but 
included in Solr's own logs, except that would mcuk up any HTTP Caching 
you might do (and putting an Accelerator Cache in front of Solr is a 
really easy way to reduce load in a lot of common situations)

-Hoss



Re: Understanding the query parser

2010-01-11 Thread Erik Hatcher


On Jan 11, 2010, at 1:33 PM, Avlesh Singh wrote:



It is in the source code of QueryParser's getFieldQuery(String field,
String queryText)  method line#660. If numTokens  1 it returns  
Phrase

Query.

That's exactly the question. Would be nice to hear from someone as  
to why is

it that way?


Suppose you indexed Foo Bar.  It'd get indexed as two tokens [foo]  
followed by [bar].  Then someone searches for foo-bar, which would get  
analyzed into two tokens also.  A PhraseQuery is the most logical  
thing for it to turn into, no?


What's the alternative?

Of course it's tricky business though, impossible to do the right  
thing for all cases within SolrQueryParser.  Thankfully it is  
pleasantly subclassable and overridable for this method.


Erik



Commons Lang

2010-01-11 Thread Jeff Newburn
We have a solr plugin that would be much easier to write if commons-lang was
available.  Why does solr not have this library?  Is there any drawbacks to
pulling in the commons lang for StringUtils?
-- 
Jeff Newburn
Software Engineer, Zappos.com


Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2010-01-11 Thread darniz

no problem

Erick Erickson wrote:
 
 Ah, I read your post too fast and ignored the title. Sorry 'bout that.
 
 Erick
 
 On Mon, Jan 11, 2010 at 2:55 PM, darniz rnizamud...@edmunds.com wrote:
 

 Well thats the whole discussion we are talking about.
 I had the impression that the html tags are filtered and then the field
 is
 stored without tags. But looks like the html tags are removed and terms
 are
 indexed purely for indexing, and the actual text is stored in raw format.

 Lets say for example if i enter a field like
 field name=bodyphonda car road review/field
 When i do analysis on the body field the html filter removes the p tag
 and
 indexed works honda, car, road, review. But when i fetch body field to
 display in my document it returns phonda car road review

 I hope i make sense.
 thanks
 darniz



 Erick Erickson wrote:
 
  This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersshows you
  many
  of the SOLR analyzers and filters. Would one of
  the various *HTMLStrip* stuff work?
 
  HTH
  ERick
 
  On Mon, Jan 11, 2010 at 2:44 PM, darniz rnizamud...@edmunds.com
 wrote:
 
 
  Thanks we were having the saem issue.
  We are trying to store article content and we are strong a field like
  pThis article is for blah /p.
  Wheni see the analysis.jsp page it does strip out the p tags and is
  indexed. but when we fetch the document it returns the field with the
 p
  tags.
  From solr point of view, its correct but our issue is that this kind
 of
  html
  tags is screwing up our display of our page. Is there an easy way to
  esure
  how to strip out hte html tags, or do we have to take care of
 manually.
 
  Thanks
  Rashid
 
 
  aseem cheema wrote:
  
   Alright. It turns out that escapedTags is not for what I thought it
 is
   for.
   The problem that I am having with HTMLStripCharFilterFactory is that
   it strips the html while indexing the field, but not while storing
 the
   field. That is why what is see in analysis.jsp, which is index
   analysis, does not match what gets stored... because.. well HTML is
   stripped only for indexing. Makes so much sense.
  
   Thanks to Ryan McKinley for clarifying this.
   Aseem
  
   On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema
 aseemche...@gmail.com
   wrote:
   I am trying to post a document with the following content using
 SolrJ:
   centercontent/center
   I need the xml/html tags to be ignored. Even though this works fine
 in
   analysis.jsp, this does not work with SolrJ, as the client escapes
 the
and  with lt; and gt; and HTMLStripCharFilterFactory does not
   strip those escaped tags. How can I achieve this? Any ideas will be
   highly appreciated.
  
   There is escapedTags in HTMLStripCharFilterFactory constructor. Is
   there a way to get that to work?
   Thanks
   --
   Aseem
  
  
  
  
   --
   Aseem
  
  
 
  --
  View this message in context:
 
 http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116601.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27118304.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multi language support

2010-01-11 Thread Don Werve
This is the way I've implemented multilingual search as well.

2010/1/11 Markus Jelsma mar...@buyways.nl

 Hello,


 We have implemented language specific search in Solr using language
 specific fields and field types. For instance, an en_text field type can
 use an English stemmer, and list of stopwords and synonyms. We, however
 did not use specific stopwords, instead we used one list shared by both
 languages.

 So you would have a field type like:
 fieldType name=en_text class=solr.TextField ...
  analyzer type=
  filter class=solr.StopFilterFactory words=stopwords.en.txt
  filter class=solr.SynonymFilterFactory synonyms=synoyms.en.txt

 etc etc.



 Cheers,

 -
 Markus Jelsma  Buyways B.V.
 Technisch ArchitectFriesestraatweg 215c
 http://www.buyways.nl  9743 AD Groningen


 Alg. 050-853 6600  KvK  01074105
 Tel. 050-853 6620  Fax. 050-3118124
 Mob. 06-5025 8350  In: http://www.linkedin.com/in/markus17


 On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote:

  Hi Solr users.
 
  I'm trying to set up a site with Solr search integrated. And I use the
  SolJava API to feed the index with search documents. At the moment I
  have only activated search on the English portion of the site. I'm
  interested in using as many features of solr as possible. Synonyms,
  Stopwords and stems all sounds quite interesting and useful but how do
  I set up this in a good way for a multilingual site?
 
  The site don't have a huge text mass so performance issues don't
  really bother me but still I'd like to hear your suggestions before I
  try to implement an solution.
 
  Best regards
 
  Daniel



Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-11 Thread Kelly Taylor

I am in the process of building a Solr search solution for my application and
have run into a roadblock with the schema design.  Trying to match criteria
in one multi-valued field with corresponding criteria in another
multi-valued field.  Any advice would be greatly appreciated.

BACKGROUND:
My RDBMS data model is such that for every one of my Product entities,
there are one-to-many SKU entities available for purchase. Each SKU entity
can have its own price, as well as one-to-many options, etc.  The web
frontend displays available Product entities on both directory and detail
pages.

In order to take advantage of Solr's facet count, paging, and sorting
functionality, I decided to base the Solr schema on Product documents; so
none of my documents currently contain duplicate Product data, and all
SKU related data is denormalized as necessary, but into multi-valued
fields.  For example, I have a document with an id field set to
Product:7, a docType field is set to Product as well as multi-valued
SKU related fields and data like, sku_color {Red | Green | Blue},
sku_size {Small | Medium | Large}, sku_price {10.00 | 10.00 | 7.99}

I hit the roadblock when I tried to answer the question, Which products are
available that contain skus with color Green, size M, and a price of $9.99
or less?...and have now begun the switch to SKU level indexing.  This
also gives me what I need for faceted browsing/navigation, and search
refinement...leading the user to Product entities having purchasable SKU
entities.  But this also means I now have documents which are mostly
duplicates for each Product, and all, facet counts, paging and sorting is
then inaccurate;  so it appears I need do this myself, with multiple Solr
requests.

Is this really the best approach; and if so, should I use the Solr
Deduplication update processor when indexing and querying?

Thanks in advance,
Kelly
-- 
View this message in context: 
http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27118977.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-11 Thread Markus Jelsma
Hello Kelly,


I am not entirely sure if i understand your problem correctly. But i
believe your first approach is the right one.

Your question: Which products are available that contain skus with color
Green, size M, and a price of $9.99 or less? can be easily answered using
a schema like yours.

id = 1
color = [green, blue]
size = [M, S]
price = 6

id = 2
color = [red, blue]
size = [L, S]
price = 12

id = 3
color = [green, red, blue]
size = [L, S, M]
price = 5

Using the data above you can answer your question using a basic Solr query
[1] like the following: q=color:green AND price:[0 TO 9,99] AND size:M

Of course, you would make this a function query [2] but this, if i
understood your question well enough, answers it.

[1] http://wiki.apache.org/solr/SolrQuerySyntax
[2] http://wiki.apache.org/solr/FunctionQuery


Cheers,


Kelly Taylor zei:

 I am in the process of building a Solr search solution for my
 application and have run into a roadblock with the schema design.
 Trying to match criteria in one multi-valued field with corresponding
 criteria in another
 multi-valued field.  Any advice would be greatly appreciated.

 BACKGROUND:
 My RDBMS data model is such that for every one of my Product entities,
 there are one-to-many SKU entities available for purchase. Each SKU
 entity can have its own price, as well as one-to-many options, etc.  The
 web frontend displays available Product entities on both directory and
 detail pages.

 In order to take advantage of Solr's facet count, paging, and sorting
 functionality, I decided to base the Solr schema on Product documents;
 so none of my documents currently contain duplicate Product data, and
 all SKU related data is denormalized as necessary, but into
 multi-valued fields.  For example, I have a document with an id field
 set to
 Product:7, a docType field is set to Product as well as
 multi-valued SKU related fields and data like, sku_color {Red |
 Green | Blue}, sku_size {Small | Medium | Large}, sku_price {10.00 |
 10.00 | 7.99}

 I hit the roadblock when I tried to answer the question, Which products
 are available that contain skus with color Green, size M, and a price of
 $9.99 or less?...and have now begun the switch to SKU level indexing.
  This also gives me what I need for faceted browsing/navigation, and
 search refinement...leading the user to Product entities having
 purchasable SKU entities.  But this also means I now have documents
 which are mostly duplicates for each Product, and all, facet counts,
 paging and sorting is then inaccurate;  so it appears I need do this
 myself, with multiple Solr requests.

 Is this really the best approach; and if so, should I use the Solr
 Deduplication update processor when indexing and querying?

 Thanks in advance,
 Kelly
 --
 View this message in context:
 http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27118977.html
 Sent from the Solr - User mailing list archive at Nabble.com.





EOF IOException Query

2010-01-11 Thread Osborn Chan
Hi all,

I got following exception for SOLR, but the index is still searchable. (At 
least it is searchable for query *:*.)
I am just wondering what is the root cause.

Thanks,
Osborn

INFO: [publicGalleryPostMaster] webapp=/multicore path=/select 
params={wt=javabinrows=12start=0sort=/gallery/1/postlist/1Rank_i+descq=%2B(comm
unityList_s_m:/gallery/1/postlist/1)+%2Bstate_s:Aversion=1} status=500 QTime=3
Jan 11, 2010 12:23:01 PM org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:151)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80)
at 
org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:112)
at 
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:712)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208)
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676)
at 
org.apache.lucene.search.FieldComparator$StringOrdValComparator.setNextReader(FieldComparator.java:667)
at 
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:245)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)


Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-11 Thread Kelly Taylor

Hi Markus,

Thanks for your reply.

Using the current schema and query like you suggest, how can I identify the
unique combination of options and price for a given SKU?   I don't want the
user to arrive at a product which doesn't completely satisfy their search
request.  For example, with the color:Green, size:M, and price:[0 to
9.99] search refinements applied,  no products should be displayed which
only have size:M in color:Blue

The actual data in the database for a product to display on the frontend
could be as follows:

product id = 1
product name = T-shirt

related skus...
-- sku id = 7 [color=green, size=S, price=10.99]
-- sku id = 9 [color=green, size=L, price=10.99]
-- sku id = 10 [color=blue, size=S, price=9.99]
-- sku id = 11 [color=blue, size=M, price=10.99]
-- sku id = 12 [color=blue, size=L, price=10.99]

Regards,
Kelly


Markus Jelsma - Buyways B.V. wrote:
 
 Hello Kelly,
 
 
 I am not entirely sure if i understand your problem correctly. But i
 believe your first approach is the right one.
 
 Your question: Which products are available that contain skus with color
 Green, size M, and a price of $9.99 or less? can be easily answered using
 a schema like yours.
 
 id = 1
 color = [green, blue]
 size = [M, S]
 price = 6
 
 id = 2
 color = [red, blue]
 size = [L, S]
 price = 12
 
 id = 3
 color = [green, red, blue]
 size = [L, S, M]
 price = 5
 
 Using the data above you can answer your question using a basic Solr query
 [1] like the following: q=color:green AND price:[0 TO 9,99] AND size:M
 
 Of course, you would make this a function query [2] but this, if i
 understood your question well enough, answers it.
 
 [1] http://wiki.apache.org/solr/SolrQuerySyntax
 [2] http://wiki.apache.org/solr/FunctionQuery
 
 
 Cheers,
 
 

-- 
View this message in context: 
http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27120031.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-11 Thread Markus Jelsma
Hello Kelly,


Simple boolean algebra, you tell Solr you want color = green AND size = M
so it will only return green t-shirts in size M. If you, however, turn the
AND in a OR it will return all t-shirts that are green OR in size M, thus
you can then get M sized shirts in the blue color or green shirts in size
XXL.

I suggest you'd just give it a try and perhaps come back later to find
some improvements for your query. It would also be a good idea - if i may
say so - to read the links provided in the earlier message.

Hope you will find what you're looking for :)


Cheers,

Kelly Taylor zei:

 Hi Markus,

 Thanks for your reply.

 Using the current schema and query like you suggest, how can I identify
 the unique combination of options and price for a given SKU?   I don't
 want the user to arrive at a product which doesn't completely satisfy
 their search request.  For example, with the color:Green, size:M,
 and price:[0 to 9.99] search refinements applied,  no products should
 be displayed which only have size:M in color:Blue

 The actual data in the database for a product to display on the frontend
 could be as follows:

 product id = 1
 product name = T-shirt

 related skus...
 -- sku id = 7 [color=green, size=S, price=10.99]
 -- sku id = 9 [color=green, size=L, price=10.99]
 -- sku id = 10 [color=blue, size=S, price=9.99]
 -- sku id = 11 [color=blue, size=M, price=10.99]
 -- sku id = 12 [color=blue, size=L, price=10.99]

 Regards,
 Kelly


 Markus Jelsma - Buyways B.V. wrote:

 Hello Kelly,


 I am not entirely sure if i understand your problem correctly. But i
 believe your first approach is the right one.

 Your question: Which products are available that contain skus with
 color Green, size M, and a price of $9.99 or less? can be easily
 answered using a schema like yours.

 id = 1
 color = [green, blue]
 size = [M, S]
 price = 6

 id = 2
 color = [red, blue]
 size = [L, S]
 price = 12

 id = 3
 color = [green, red, blue]
 size = [L, S, M]
 price = 5

 Using the data above you can answer your question using a basic Solr
 query [1] like the following: q=color:green AND price:[0 TO 9,99] AND
 size:M

 Of course, you would make this a function query [2] but this, if i
 understood your question well enough, answers it.

 [1] http://wiki.apache.org/solr/SolrQuerySyntax
 [2] http://wiki.apache.org/solr/FunctionQuery


 Cheers,



 --
 View this message in context:
 http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27120031.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-11 Thread Kelly Taylor

Hi Markus,

Thanks again. I wish this were simple boolean algebra. This is something I
have already tried. So either I am missing the boat completely, or have
failed to communicate it clearly. I didn't want to confuse the issue further
but maybe the following excerpts will help...

Excerpt from  Solr 1.4 Enterprise Search Server by David Smiley  Eric
Pugh...

...the criteria for this hypothetical search involves multi-valued fields,
where the index of one matching criteria needs to correspond to the same
value in another multi-valued field in the same index. You can't do that...

And this excerpt is from Solr and RDBMS: The basics of designing your
application for the best of both by by Amit Nithianandan...

...If I wanted to allow my users to search for wiper blades available in a
store nearby, I might create an index with multiple documents or records for
the same exact wiper blade, each document having different location data
(lat/long, address, etc.) to represent an individual store. Solr has a
de-duplication component to help show unique documents in case that
particular wiper blade is available in multiple stores near me...

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Solr-and-RDBMS-design-basics

Remember, with my original schema definition I have multi-valued fields, and
when the product document is built, these fields do contain an array of
values retrieved from each of the related skus. Skus are children of my
products.

Using your example data, which t-shirt sku is available for purchase as a
child of t-shirt product with id 3? Is it really the green, M, or have we
found a product document related to both a green t-shirt and a Medium
t-shirt of some other color, which will thereby leave the user with nothing
to purchase?

sku = 9 [color=green, size=L, price=10.99], product id = 3
sku = 10 [color=blue, size=S, price=9.99], product id = 3
sku = 11 [color=blue, size=M, price=10.99], product id = 3

 id = 1
 color = [green, blue]
 size = [M, S]
 price = 6

 id = 2
 color = [red, blue]
 size = [L, S]
 price = 12

 id = 3
 color = [green, red, blue]
 size = [L, S, M]
 price = 5

If this is still unclear, I'll post a new question based on findings from
this conversation. Thanks for all of your help.

-Kelly


Markus Jelsma - Buyways B.V. wrote:
 
 Hello Kelly,
 
 
 Simple boolean algebra, you tell Solr you want color = green AND size = M
 so it will only return green t-shirts in size M. If you, however, turn the
 AND in a OR it will return all t-shirts that are green OR in size M, thus
 you can then get M sized shirts in the blue color or green shirts in size
 XXL.
 
 I suggest you'd just give it a try and perhaps come back later to find
 some improvements for your query. It would also be a good idea - if i may
 say so - to read the links provided in the earlier message.
 
 Hope you will find what you're looking for :)
 
 
 Cheers,
 
 Kelly Taylor zei:

 Hi Markus,

 Thanks for your reply.

 Using the current schema and query like you suggest, how can I identify
 the unique combination of options and price for a given SKU?   I don't
 want the user to arrive at a product which doesn't completely satisfy
 their search request.  For example, with the color:Green, size:M,
 and price:[0 to 9.99] search refinements applied,  no products should
 be displayed which only have size:M in color:Blue

 The actual data in the database for a product to display on the frontend
 could be as follows:

 product id = 1
 product name = T-shirt

 related skus...
 -- sku id = 7 [color=green, size=S, price=10.99]
 -- sku id = 9 [color=green, size=L, price=10.99]
 -- sku id = 10 [color=blue, size=S, price=9.99]
 -- sku id = 11 [color=blue, size=M, price=10.99]
 -- sku id = 12 [color=blue, size=L, price=10.99]

 Regards,
 Kelly


 Markus Jelsma - Buyways B.V. wrote:

 Hello Kelly,


 I am not entirely sure if i understand your problem correctly. But i
 believe your first approach is the right one.

 Your question: Which products are available that contain skus with
 color Green, size M, and a price of $9.99 or less? can be easily
 answered using a schema like yours.

 id = 1
 color = [green, blue]
 size = [M, S]
 price = 6

 id = 2
 color = [red, blue]
 size = [L, S]
 price = 12

 id = 3
 color = [green, red, blue]
 size = [L, S, M]
 price = 5

 Using the data above you can answer your question using a basic Solr
 query [1] like the following: q=color:green AND price:[0 TO 9,99] AND
 size:M

 Of course, you would make this a function query [2] but this, if i
 understood your question well enough, answers it.

 [1] http://wiki.apache.org/solr/SolrQuerySyntax
 [2] http://wiki.apache.org/solr/FunctionQuery


 Cheers,



 --
 View this message in context:
 http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27120031.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 

-- 
View this message in context: 

Re: Commons Lang

2010-01-11 Thread Erik Hatcher
There's no point in moving it to Solr core unless something in core  
depends on it.


The VelocityResponseWriter depends on commons-lang, though, and I am  
aiming to integrate that into core at some point.


But, you can put commons-lang in your solr-home/lib and your plugin  
will be able to see it fine.


Erik


On Jan 11, 2010, at 4:39 PM, Jeff Newburn wrote:

We have a solr plugin that would be much easier to write if commons- 
lang was
available.  Why does solr not have this library?  Is there any  
drawbacks to

pulling in the commons lang for StringUtils?
--
Jeff Newburn
Software Engineer, Zappos.com




Re: Tokenizer question

2010-01-11 Thread Chris Hostetter

: q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)
: 
: the resulting parsed query contains a phrase query:
: 
: +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:39 43)

This stems from some fairly fundemental behavior i nthe QueryParser ... 
each chunk of input that isn't deemed markup (ie: not field names, or 
special characters) is sent to the analyzer.  If the analyzer produces 
multiple tokens at differnet positions, then a PhraseQuery is constructed. 
-- Things like simple phrase searchs and N-Gram based partial matching 
require this behavior.

If the analyzer produces multiple Tokens, but they all have the same 
position then the QueryParser produces a BooleanQuery will all SHOULD 
clauses.  -- This is what allows simple synonyms to work.

If you write a simple TokenFilter to flatten all of the positions to be 
the same, and use it after WordDelimiterFilter then it should give you the 
OR style query you want.

This isn't hte default behavior because the Phrase behavior of WDF fits 
it's intended case better --- someone searching for a product sku 
like X3QZ-D5 expects it to match X-3QZD5, but not just X or 3QZ

-Hoss



Re: Tokenizer question

2010-01-11 Thread Avlesh Singh

 If the analyzer produces multiple Tokens, but they all have the same
 position then the QueryParser produces a BooleanQuery will all SHOULD
 clauses.  -- This is what allows simple synonyms to work.

You rock Hoss!!! This is exactly the explanation I was looking for .. it is
as simple as it sounds. Thanks!

Cheers
Avlesh

On Tue, Jan 12, 2010 at 6:37 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)
 :
 : the resulting parsed query contains a phrase query:
 :
 : +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:39 43)

 This stems from some fairly fundemental behavior i nthe QueryParser ...
 each chunk of input that isn't deemed markup (ie: not field names, or
 special characters) is sent to the analyzer.  If the analyzer produces
 multiple tokens at differnet positions, then a PhraseQuery is constructed.
 -- Things like simple phrase searchs and N-Gram based partial matching
 require this behavior.

 If the analyzer produces multiple Tokens, but they all have the same
 position then the QueryParser produces a BooleanQuery will all SHOULD
 clauses.  -- This is what allows simple synonyms to work.

 If you write a simple TokenFilter to flatten all of the positions to be
 the same, and use it after WordDelimiterFilter then it should give you the
 OR style query you want.

 This isn't hte default behavior because the Phrase behavior of WDF fits
 it's intended case better --- someone searching for a product sku
 like X3QZ-D5 expects it to match X-3QZD5, but not just X or 3QZ

 -Hoss




Re: Understanding the query parser

2010-01-11 Thread Avlesh Singh
Thanks Erik for responding.
Hoss explained the behavior with nice corollaries here -
http://www.lucidimagination.com/search/document/8bc351d408f24cf6/tokenizer_question

Cheers
Avlesh

On Tue, Jan 12, 2010 at 2:21 AM, Erik Hatcher erik.hatc...@gmail.comwrote:


 On Jan 11, 2010, at 1:33 PM, Avlesh Singh wrote:


 It is in the source code of QueryParser's getFieldQuery(String field,
 String queryText)  method line#660. If numTokens  1 it returns Phrase
 Query.

  That's exactly the question. Would be nice to hear from someone as to
 why is
 it that way?


 Suppose you indexed Foo Bar.  It'd get indexed as two tokens [foo]
 followed by [bar].  Then someone searches for foo-bar, which would get
 analyzed into two tokens also.  A PhraseQuery is the most logical thing for
 it to turn into, no?

 What's the alternative?

 Of course it's tricky business though, impossible to do the right thing for
 all cases within SolrQueryParser.  Thankfully it is pleasantly subclassable
 and overridable for this method.

Erik




Solr 1.4 Field collapsing - What are the steps for applying the SOLR-236 patch?

2010-01-11 Thread Kelly Taylor

Hi,

Is there a step-by-step for applying the patch for SOLR-236 to enable field
collapsing in Solr 1.4?

Thanks,
Kelly
-- 
View this message in context: 
http://old.nabble.com/Solr-1.4-Field-collapsing---What-are-the-steps-for-applying-the-SOLR-236-patch--tp27122621p27122621.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 1.4 Field collapsing - What are the steps for applying the SOLR-236 patch?

2010-01-11 Thread Joe Calderon
it seems to be in flux right now as the solr developers slowly make 
improvements and ingest the various pieces into the solr trunk, i think 
your best bet might be to use the 12/24 patch and fix any errors where 
it doesnt apply cleanly


im using solr trunk r892336 with the 12/24 patch


--joe
On 01/11/2010 08:48 PM, Kelly Taylor wrote:

Hi,

Is there a step-by-step for applying the patch for SOLR-236 to enable field
collapsing in Solr 1.4?

Thanks,
Kelly
   




Seattle Hadoop / HBase / Lucene / NoSQL meetup Jan 27th!

2010-01-11 Thread Bradford Stephens
Greetings,

A friendly reminder that the Seattle Hadoop, NoSQL, etc. meetup is on
January 27th at University of Washington in the Allen Computer Science
Building, room 303.

I believe Razorfish will be giving a talk on how they use Hadoop.

Here's the new, shiny meetup.com link with more detail:
http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup

-- 
http://www.drawntoscalehq.com -- Big Data for all. The Big Data Platform.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science


Re: Tokenizer question

2010-01-11 Thread rswart


Cristal clear. Thanks for your responsetime!
-- 
View this message in context: 
http://old.nabble.com/Tokenizer-question-tp27099119p27123281.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: update solr index

2010-01-11 Thread Shalin Shekhar Mangar
On Mon, Jan 11, 2010 at 7:42 PM, Marc Des Garets marc.desgar...@192.comwrote:


 I am running solr in tomcat and I have about 35 indexes (between 2 and
 80 millions documents each). Currently if I try to update few documents
 from an index (let's say the one which contains 80 millions documents)
 while tomcat is running and therefore receiving requests, I am getting
 few very long garbage collection (about 60sec). I am running tomcat with
 -Xms10g -Xmx10g -Xmn2g -XX:PermSize=256m -XX:MaxPermSize=256m. I'm using
 ConcMarkSweepGC.

 I have 2 questions:
 1. Is solr doing something specific while an index is being updated like
 updating something in memory which would cause the garbage collection?


Solr's caches are thrown away and a fixed number of old queries are
re-executed to re-generated the cache on the new index (known as
auto-warming). This happens on a commit.



 2. Any idea how I could solve this problem? Currently I stop tomcat,
 update index, start tomcat. I would like to be able to update my index
 while tomcat is running. I was thinking about running more tomcat
 instance with less memory for each and each running few of my indexes.
 Do you think it would be the best way to go?


If you stop tomcat, how do you update the index? Are you running a
multi-core setup? Perhaps it is better to split up the indexes among
multiple boxes. Also, you should probably lower the JVM heap so that the
full GC pause doesn't make your index unavailable for such a long time.

Also see
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

-- 
Regards,
Shalin Shekhar Mangar.


What is this error means?

2010-01-11 Thread Ellery Leung
When I am building the index for around 2 ~ 25000 records, sometimes I
came across with this error:

 

Uncaught exception Exception with message '0' Status: Communication Error

 

I search Google  Yahoo but no answer.

 

I am now committing document to solr on every 10 records fetched from a
SQLite Database with PHP 5.3.

 

Platform: Windows 7 Home

Web server: Nginx

Solr Specification Version: 1.4.0

Solr Implementation Version: 1.4.0 833479 - grantingersoll - 2009-11-06
12:33:40

Lucene Specification Version: 2.9.1

Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25

Solr hosted in jetty 6.1.3

 

All the above are in one single test machine.

 

The situation is that sometimes when I build the index, it can be created
successfully.  But sometimes it will just stop with the above error.

 

Any clue?  Please help.

 

Thank you in advance.