Re: Implementing Search Suggestion on Solr

2010-10-20 Thread Pablo Recio
Yeah, I know.

Does anyone could tell me wich one is the good way?

Regards,
 What an interesting application :-)

 Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
better idea to learn from others’ mistakes, so you do not have to make them
yourself. from '
http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

 EARTH has a Right To Life,
 otherwise we all die.


 --- On Mon, 10/18/10, Pablo Recio Quijano pre...@yaco.es wrote:

 From: Pablo Recio Quijano pre...@yaco.es
 Subject: Implementing Search Suggestion on Solr
 To: solr-user@lucene.apache.org
 Date: Monday, October 18, 2010, 3:53 AM
 Hi!

 I'm trying to implement some kind of Search Suggestion on a
 search engine I have implemented. This search suggestions
 should not be automatically like the one described for the
 SpellCheckComponent [1]. I'm looking something like:

 SAS oppositions = Public job offers for
 some-company

 So I will have to define it manually. I was thinking about
 synonyms [2] but I don't know if it's the proper way to do
 it, because semantically those terms are not synonyms.

 Any ideas or suggestions?

 Regards,

 [1] http://wiki.apache.org/solr/SpellCheckComponent
 [2]
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory



Re: Lucene vs Solr

2010-10-20 Thread Pradeep Singh
Is that right?

On Tue, Oct 19, 2010 at 11:08 PM, findbestopensource 
findbestopensou...@gmail.com wrote:

 Hello all,

 I have posted an article Lucene vs Solr
 http://www.findbestopensource.com/article-detail/lucene-vs-solr

 Please feel free to add your comments.

 Regards
 Aditya
 www.findbestopensource.com



Re: SolrJ new javabin format

2010-10-20 Thread Shawn Heisey

 On 10/19/2010 2:40 PM, Chris Hostetter wrote:

The formats are not currently compatible.  The first priority was to get
the format fixed so it was using true UTF8 (instead of Java's bastardized
modified UTF8) in a way that would generate a clear error if people
attempted to use an older SolrJ to talk to a newer SOlr server (or vice
versa).

The concensus was that fixing thta problem was worth the added complexity
during upgrading -- people that want to use SolrJ 1.4 to talk to a Solr
3.x server can always use the XML format instead of the binary format.


What happens with distributed search, which uses javabin behind the 
scenes?  I don't query my actual index machines with a shards parameter, 
I have dedicated brokers (with empty indexes) that have the shards 
parameter included in the request handler, pointed at load balancer IP 
addresses.  Is there any way to have that use XML instead of javabin, or 
do I need to be cautious about not mixing versions during the upgrade?


Thanks,
Shawn



RE: Dismax phrase boosts on multi-value fields

2010-10-20 Thread Jason Brown
Thanks Jonathan.
 
To further clarify, I understand the the match of 
 
my blue rabbit
 
would have to be found in 1 element (of my multi-valued defined field) for the 
phrase boost on that field to kick in.
 
If for example my document had the following 3 entries for the multi-value 
field
 
 
my black cat
his blue car
her pink rabbit
 
Then I assume the phrase boost would not kick-in as the search term (my blue 
rabbit) isnt found in a single element (but can be found across them).
 
Thanks again
 
Jason.



From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Tue 19/10/2010 17:27
To: solr-user@lucene.apache.org
Subject: Re: Dismax phrase boosts on multi-value fields



You are correct.  The query needs to match as a phrase. It doesn't need
to match everything. Note that if a value is:

long sentence with my blue rabbit in it,

then query my blue rabbit will also match as a phrase, for phrase
boosting or query purposes.

Jonathan

Jason Brown wrote:
 

 Hi - I have a multi-value field, so say for example it consists of

 'my black cat'
 'my white dog'
 'my blue rabbit'

 The field is whitespace parsed when put into the index.

 I have a phrase query boost configured on this field which I understand kicks 
 in when my search term is found entirely in this field.

 So, if the search term is 'my blue rabbit', then I understand that my phrase 
 boost will be applied as this is found entirley in this field.

 My question/presumption is that as this is a multi-valued field, only 1 value 
 of the multi-value needs to match for the phrase query boost (given my very 
 imaginative set of test data :-) above, you can see that this obviously 
 matches 1 value and not them all)

 Thanks for your help.






 If you wish to view the St. James's Place email disclaimer, please use the 
 link below

 http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer

  



If you wish to view the St. James's Place email disclaimer, please use the link 
below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer


Re: Solr with example Jetty and score problem

2010-10-20 Thread Floyd Wu
I tried this work-around, but seems not work for me.
I still get array of score in the response.

I have two physical server A and B

localhost -- A
test --B

I issue query to A like this

http://localhost:8983/solr/core0/select?shards=test:8983/solr,localhost:8983/solr/core0indent=onversion=2.2q=*%3A*fq=start=0rows=10fl=*%2Cscoreqt=standardwt=standard
Hi Hoss,

But when I change query to

http://localhost:8983/solr/core0/select?shards=test:8983/solrindent=onversion=2.2q=*%3A*fq=start=0rows=10fl=*%2Cscoreqt=standardwt=standard

The score will be noraml. (that's just like issue query to test:8983)

any idea?



2010/10/16 Chris Hostetter hossman_luc...@fucit.org


 : Thanks. But do you have any suggest or work-around to deal with it?

 Posted in SOLR-2140

   field name=score type=ignored multiValued=false /

 ..this key is to make sure solr knows score is not multiValued


 -Hoss



Re: Solr with example Jetty and score problem

2010-10-20 Thread Floyd Wu
Ok I Do a little test after previous email. The work-around that hoss
provided is not work when you issue query *:*

I tried to issue query like key:aaa and work-around works no matter shards
node is one or two or more.

Thanks hoss. And maybe you could try and help me confirmed this situation is
not coincidence.




2010/10/20 Floyd Wu floyd...@gmail.com

 I tried this work-around, but seems not work for me.
 I still get array of score in the response.

 I have two physical server A and B

 localhost -- A
 test --B

 I issue query to A like this


 http://localhost:8983/solr/core0/select?shards=test:8983/solr,localhost:8983/solr/core0indent=onversion=2.2q=*%3A*fq=start=0rows=10fl=*%2Cscoreqt=standardwt=standard
 Hi Hoss,

 But when I change query to

 http://localhost:8983/solr/core0/select?shards=test:8983/solrindent=onversion=2.2q=*%3A*fq=start=0rows=10fl=*%2Cscoreqt=standardwt=standard

 The score will be noraml. (that's just like issue query to test:8983)

 any idea?



 2010/10/16 Chris Hostetter hossman_luc...@fucit.org


 : Thanks. But do you have any suggest or work-around to deal with it?

 Posted in SOLR-2140

   field name=score type=ignored multiValued=false /

 ..this key is to make sure solr knows score is not multiValued


 -Hoss





Re: Multiple partial word searching with dismax handler

2010-10-20 Thread Chamnap Chhorn
Anyone can suggests how to do multiple partial word searching?

On Wed, Oct 20, 2010 at 11:42 AM, Chamnap Chhorn chamnapchh...@gmail.comwrote:

 Hi,

 I have some problem with combining the query with multiple parital-word
 searching in dismax handler. In order to make multiple partial word
 searching, I use EdgeNGramFilterFactory, and my query must be something like
 this: name_ngram:sun name_ngram:hot in q.alt combined with my search
 handler (
 http://localhost:8081/solr/select/?q.alt=name_ngram:sun%20name_ngram:hotqt=products).
 I wonder how I combine this with my search handler.

 Here is my search handler config:
   requestHandler name=products class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows20/int
   str name=defTypedismax/str
   str name=qfname^200 full_text/str
   str name=bffap^15/str
   str name=fluuid/str
   str name=version2.2/str
   str name=indenton/str
   str name=tie0.1/str
 /lst
 lst name=appends
   str name=fqtype:Product/str
 /lst
 lst name=invariants
   str name=facetfalse/str
 /lst
 arr name=last-components
   strspellcheck/str
   strelevateProducts/str
 /arr
   /requestHandler

 If I query with this url 
 http://localhost:8081/solr/select/?q.alt=name_ngram:sun%20name_ngram:hotq=sun
 hotqt=productshttp://localhost:8081/solr/select/?q.alt=name_ngram:sun%20name_ngram:hotq=sun+hotqt=products,
 it doesn't show the correct answer like the previous query.

 How could configure this in my search handler with boost score?

 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/




-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


Re: xi:include

2010-10-20 Thread Stefan Matheis
Wouldn't it be easier to ensure, that your config.aspx returns valid
xml? Wrap your existing Code with some Exception-Handling and return your
fallback-xml if something goes wrong?


Searching with Number fields

2010-10-20 Thread Hasnain

Hi,

   Im having trouble with searching with number fields, if this field has
alphanumerics then search is working perfect but not with all numbers, can
anyone suggest me  solution???

fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/


filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-with-Number-fields-tp1737513p1737513.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: xi:include

2010-10-20 Thread Peter A. Kirk
Hi

Thanks for your reply. In actual fact, the config.aspx will either return a 
valid xml, or it will return an empty string - and unfortunately an empty 
string is not considered valid xml by the Solr xml parser.

The config.aspx is a rather general application, returning all sorts of data, 
depending on the parameters supplied to it. It doesn't know what fallback xml 
is appropriate in a specific instance.

For example, it might be called like this:
http://localhost/config/config.aspx?id=core1dismaxweight=qf

But if configuration in solrconfig.xml is entered incorrectly (eg maybe the 
id parameter to config.aspx is incorrect) then config.aspx returns an empty 
string.

Other xml parsers which handle x:include, and xi:fallback, actually invoke the 
fallback if any error occurs during the include (not only if the include 
resource does not exist). Is it possible to configure the Solr parser so it 
invokes the fallback on any error?

Thanks,
Peter


From: Stefan Matheis [matheis.ste...@googlemail.com]
Sent: Wednesday, 20 October 2010 22:05
To: solr-user@lucene.apache.org
Subject: Re: xi:include

Wouldn't it be easier to ensure, that your config.aspx returns valid
xml? Wrap your existing Code with some Exception-Handling and return your
fallback-xml if something goes wrong?

Solr on WebSphere 7

2010-10-20 Thread laurent altaber
Hello experts,

Has anyone succeded in configuring and run Solr on WebSphere 7 and be
kind enough to help me on this ?

New to Solr and Websphere, I am looking on any hints on how to
configure solr on WebSphere 7. I was able to configure and run it on
tomcat and from the embedded Jetty.

The wiki page is very poor  on this particular Application Server, any
help would be greatly appreciated,


Thanks,

Veraz


Re: Not able to subscribe to ML

2010-10-20 Thread Tharindu Mathew
I had the same problem. The work around was to send mails in plain text.

On Wed, Oct 20, 2010 at 10:21 AM, Abdullah Shaikh
abdullah.shaik...@gmail.com wrote:
 Just a test mail to check if my mails are reaching the ML.

 I dont know, but my mails are failing to reach the ML with the following
 error :

 Delivery to the following recipient failed permanently:

    solr-u...@lucene.apache.org

 Technical details of permanent failure:
 Google tried to deliver your message, but it was rejected by the recipient
 domain. We recommend contacting the other email provider for further
 information about the cause of this error. The error that the other server
 returned was: 552 552 spam score (5.7) exceeded threshold (state 18).


 - Abdullah




-- 
Regards,

Tharindu


Announcing Blaze - Appliance for Solr

2010-10-20 Thread Initcron Labs
Initcron Labs Announces Blaze - Appliance for Solr .

Read more at and download from :  http://www.initcron.org/blaze

Blaze is a tailor made appliance  preinstalled and preconfigured with Apache
Solr  running within Tomcat servlet  container. It  lets you focus on
developing applications based on Apache Solr platform  and not worry about
installation, configuration complexities.

Blaze Appliance is built with Suse Studio and is available in following
formats

- LiveCD
- USB Drive/ HDD Image
- Preload ISO
- Virtual Machine Images
- Xen
- VMWare, Virtualbox
- OVM Open Format
- Amazon EC2 Image Format


You could get your solr installation setup and running within minutes.
The appliance is also production ready being configured with Tomcat. Comes
with webyast for web administration and configuration of the appliance.



Thanks

Initcron Labs

www.initcron.org
















































 *
*


Step by step tutorial for multi-language indexing and search

2010-10-20 Thread Jakub Godawa
Hi everyone! (my first post)

I am new, but really curious about usefullness of lucene/solr in documents
search from the web applications. I use Ruby on Rails to create one, with
plugin acts_as_solr_reloaded that makes connection between web app and
solr easy.

So I am in a point, where I know that good solution is to prepare
multi-language documents with fields like:
question_en, answer_en,
question_fr, answer_fr,
question_pl,  answer_pl... etc.

I need to create an index that would work with 6 languages: english, french,
german, russian, ukrainian and polish.

My questions are:
1. Is it doable to have just one search field that behaves like Google's for
all those documents? It can be an option to indicate a language to search.
2. How should I begin changing the solr/conf/schema.xml (or other) file to
tailor it to my needs? As I am a real rookie here, I am still a bit confused
about fields, fieldTypes and their connection with particular field (ex.
answer_fr) and the tokenizers and analyzers. If someone can provide a
basic step by step tutorial on how to make it work in two languages I would
be more that happy.
3. Do all those languages are supported (officially/unofficialy) by
lucene/solr?

Thank you for help,
Jakub Godawa.


Re: Announcing Blaze - Appliance for Solr

2010-10-20 Thread Stefan Moises

 Sounds good, but there is nothing to download on Sourceforge?
Is this free or do you charge for it?

Cheers,
Stefan

Am 20.10.2010 13:03, schrieb Initcron Labs:

Initcron Labs Announces Blaze - Appliance for Solr .

Read more at and download from :  http://www.initcron.org/blaze

Blaze is a tailor made appliance  preinstalled and preconfigured with Apache
Solr  running within Tomcat servlet  container. It  lets you focus on
developing applications based on Apache Solr platform  and not worry about
installation, configuration complexities.

Blaze Appliance is built with Suse Studio and is available in following
formats

- LiveCD
- USB Drive/ HDD Image
- Preload ISO
- Virtual Machine Images
- Xen
- VMWare, Virtualbox
- OVM Open Format
- Amazon EC2 Image Format


You could get your solr installation setup and running within minutes.
The appliance is also production ready being configured with Tomcat. Comes
with webyast for web administration and configuration of the appliance.



Thanks

Initcron Labs

www.initcron.org
















































  *
*



--
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***



Re: Announcing Blaze - Appliance for Solr

2010-10-20 Thread Stefan Matheis
Did you visit http://sourceforge.net/projects/blazeappliance/files/ ?
There are currently Blaze__Appliance_for_Solr.i686-0.1.1.oem.tar.gz (412MB)
 Blaze__Appliance_for_Solr.i686-0.1.1.ovf.tar.gz (434MB) to download

On Wed, Oct 20, 2010 at 3:23 PM, Stefan Moises moi...@shoptimax.de wrote:

  Sounds good, but there is nothing to download on Sourceforge?
 Is this free or do you charge for it?

 Cheers,
 Stefan

 Am 20.10.2010 13:03, schrieb Initcron Labs:

  Initcron Labs Announces Blaze - Appliance for Solr .

 Read more at and download from :  http://www.initcron.org/blaze

 Blaze is a tailor made appliance  preinstalled and preconfigured with
 Apache
 Solr  running within Tomcat servlet  container. It  lets you focus on
 developing applications based on Apache Solr platform  and not worry about
 installation, configuration complexities.

 Blaze Appliance is built with Suse Studio and is available in following
 formats

 - LiveCD
 - USB Drive/ HDD Image
 - Preload ISO
 - Virtual Machine Images
 - Xen
 - VMWare, Virtualbox
 - OVM Open Format
 - Amazon EC2 Image Format


 You could get your solr installation setup and running within minutes.
 The appliance is also production ready being configured with Tomcat. Comes
 with webyast for web administration and configuration of the appliance.



 Thanks

 Initcron Labs

 www.initcron.org
















































  *
 *


 --
 ***
 Stefan Moises
 Senior Softwareentwickler

 shoptimax GmbH
 Guntherstraße 45 a
 90461 Nürnberg
 Amtsgericht Nürnberg HRB 21703
 GF Friedrich Schreieck

 Tel.: 0911/25566-25
 Fax:  0911/25566-29
 moi...@shoptimax.de
 http://www.shoptimax.de
 ***




Re: Boosting documents based on the vote count

2010-10-20 Thread Alexandru Badiu
Thanks, will look into those.

Andu

On Mon, Oct 18, 2010 at 4:14 PM, Ahmet Arslan iori...@yahoo.com wrote:
 I know but I can't figure out what
 functions to use. :)

 Oh, I see. Why not just use {!boost b=log(vote)}?

 May be scale(vote,0.5,10)?






Shards VS Merged Core?

2010-10-20 Thread ahammad

Hello all,

I'm just wondering what the benefits/consequences are of using shards or
merging all the cores into a single core. Personally I have tried both, but
my document set is not large enough that I can actually test performance and
whatnot.

What is a better approach of implementing a search mechanism on multiple
cores (10-15 cores)?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Shards-VS-Merged-Core-tp1738771p1738771.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Announcing Blaze - Appliance for Solr

2010-10-20 Thread Stefan Moises
 oh, I guess they have just uploaded it... when I've checked the file 
list was empty :)


Am 20.10.2010 15:36, schrieb Stefan Matheis:

Did you visit http://sourceforge.net/projects/blazeappliance/files/ ?
There are currently Blaze__Appliance_for_Solr.i686-0.1.1.oem.tar.gz (412MB)
  Blaze__Appliance_for_Solr.i686-0.1.1.ovf.tar.gz (434MB) to download

On Wed, Oct 20, 2010 at 3:23 PM, Stefan Moisesmoi...@shoptimax.de  wrote:


  Sounds good, but there is nothing to download on Sourceforge?
Is this free or do you charge for it?

Cheers,
Stefan

Am 20.10.2010 13:03, schrieb Initcron Labs:

  Initcron Labs Announces Blaze - Appliance for Solr .

Read more at and download from :  http://www.initcron.org/blaze

Blaze is a tailor made appliance  preinstalled and preconfigured with
Apache
Solr  running within Tomcat servlet  container. It  lets you focus on
developing applications based on Apache Solr platform  and not worry about
installation, configuration complexities.

Blaze Appliance is built with Suse Studio and is available in following
formats

- LiveCD
- USB Drive/ HDD Image
- Preload ISO
- Virtual Machine Images
- Xen
- VMWare, Virtualbox
- OVM Open Format
- Amazon EC2 Image Format


You could get your solr installation setup and running within minutes.
The appliance is also production ready being configured with Tomcat. Comes
with webyast for web administration and configuration of the appliance.



Thanks

Initcron Labs

www.initcron.org
















































  *
*



--
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***




--
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***



Re: Step by step tutorial for multi-language indexing and search

2010-10-20 Thread Dennis Gearon
Thre's approximately a 100% chance that you are going to go through a server 
side langauge(php, ruby, pearl, java, VB/asp/,net[cough,cough]), before you get 
to Solr/Lucene. I'd recommend it anyway.

This code will should look at the user's browser locale (en_US, pl_PL, es_CO, 
etc). The server side langauge would then choose wich language to search by and 
display.

NOW, that being said, are you going to have the exact same content for all 
langauges, just translated? The temptation would be to translate to a common 
language like English, then do the search, then get the translation. I wouln'dt 
recommend it, but I'm no expert. Translation of single words can be OK, but 
mulitword ideas and especially sentences doesn't work so well that way.

you probably will have separate content for that reason, AND another. Different 
cultures are interested in different things and only have common ground on 
cetain things like international news (but with different opinions) and medical 
news. So different content for differnt cultures speaking different languages.

Are you tryihg to address differnt languages in some place like the US or Great 
Britain, with LOTS of different languages spoken in minority cultures? Only 
then would you want a geographically centered server and information gathering 
organization. If you were going to have search for other countries, then I'd 
recommend those resources be geogrpahically close to their source culture.
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Wed, 10/20/10, Jakub Godawa jakub.god...@gmail.com wrote:

 From: Jakub Godawa jakub.god...@gmail.com
 Subject: Step by step tutorial for multi-language indexing and search
 To: solr-user@lucene.apache.org
 Date: Wednesday, October 20, 2010, 6:03 AM
 Hi everyone! (my first post)
 
 I am new, but really curious about usefullness of
 lucene/solr in documents
 search from the web applications. I use Ruby on Rails to
 create one, with
 plugin acts_as_solr_reloaded that makes connection
 between web app and
 solr easy.
 
 So I am in a point, where I know that good solution is to
 prepare
 multi-language documents with fields like:
 question_en, answer_en,
 question_fr, answer_fr,
 question_pl,  answer_pl... etc.
 
 I need to create an index that would work with 6 languages:
 english, french,
 german, russian, ukrainian and polish.
 
 My questions are:
 1. Is it doable to have just one search field that behaves
 like Google's for
 all those documents? It can be an option to indicate a
 language to search.
 2. How should I begin changing the solr/conf/schema.xml (or
 other) file to
 tailor it to my needs? As I am a real rookie here, I am
 still a bit confused
 about fields, fieldTypes and their connection with
 particular field (ex.
 answer_fr) and the tokenizers and analyzers. If someone
 can provide a
 basic step by step tutorial on how to make it work in two
 languages I would
 be more that happy.
 3. Do all those languages are supported
 (officially/unofficialy) by
 lucene/solr?
 
 Thank you for help,
 Jakub Godawa.



Re: why solr search is slower than lucene so much?

2010-10-20 Thread Yonik Seeley
Careful comparing apples to oranges ;-)
For one, your lucene code doesn't retrieve stored fields.
Did you try the solr request more than once (with a different q, but
the same filters?)

Also, by default, Solr independently caches the filters.  This can be
higher up-front cost, but a win when filters are reused.  If you want
something closer to your lucene code, you could add all the filters to
 the main query and not use fq.

-Yonik
http://www.lucidimagination.com



On Wed, Oct 20, 2010 at 7:07 AM, kafka0102 kafka0...@163.com wrote:
 HI.
 my solr seach has some performance problem recently.
 my query is like that: q=xxfq=fid:1fq=atm:[int_time1 TO int_time2],
 fid's type is : fieldType name=int class=solr.TrieIntField
 precisionStep=0 omitNorms=true positionIncrementGap=0/
 atm's type is : fieldType name=sint class=solr.TrieIntField
 precisionStep=8 omitNorms=true positionIncrementGap=0/
 my index's size is about 500M and record num is 3984274.
 when I use solr's SolrIndexSearcher.search(QueryResult qr, QueryCommand
 cmd), it cost about70ms. When I changed use lucence'API, just like bottom:

      final SolrQueryRequest req = rb.req;
      final SolrIndexSearcher searcher = req.getSearcher();
      final SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand();
      final ExecuteTimeStatics timeStatics =
 ExecuteTimeStatics.getExecuteTimeStatics();
      final ExecuteTimeUnit staticUnit =
 timeStatics.addExecuteTimeUnit(test2);
      staticUnit.start();
      final ListQuery query = cmd.getFilterList();
      final BooleanQuery booleanFilter = new BooleanQuery();
      for (final Query q : query) {
        booleanFilter.add(new BooleanClause(q,Occur.MUST));
      }
      booleanFilter.add(new BooleanClause(cmd.getQuery(),Occur.MUST));
      logger.info(q:+query);
      final Sort sort = cmd.getSort();
      final TopFieldDocs docs = searcher.search(booleanFilter,null,20,sort);
      final StringBuilder sbBuilder = new StringBuilder();
      for (final ScoreDoc doc :docs.scoreDocs) {
        sbBuilder.append(doc.doc+,);
      }
      logger.info(hits:+docs.totalHits+,result:+sbBuilder.toString());
      staticUnit.end();

 it cost only about 20ms.
 I'm so confused. For solr's config, I closed cache. For test, I first called
 lucene's, and then solr's.
 Maybe I should look solr's source more carefully. But now, can anyone knows
 the reason?





Re: Step by step tutorial for multi-language indexing and search

2010-10-20 Thread Jakub Godawa
2010/10/20 Dennis Gearon gear...@sbcglobal.net

 Thre's approximately a 100% chance that you are going to go through a
 server side langauge(php, ruby, pearl, java, VB/asp/,net[cough,cough]),
 before you get to Solr/Lucene. I'd recommend it anyway.


I use a server side language (Ruby) as I build the web application.


 This code will should look at the user's browser locale (en_US, pl_PL,
 es_CO, etc). The server side langauge would then choose wich language to
 search by and display.


As I said, I may provide locale as an addition to the search query.


 NOW, that being said, are you going to have the exact same content for all
 langauges, just translated? The temptation would be to translate to a common
 language like English, then do the search, then get the translation. I
 wouln'dt recommend it, but I'm no expert. Translation of single words can be
 OK, but mulitword ideas and especially sentences doesn't work so well that
 way.


I would like not to yield that temptation. I know that Solr/Lucene can work
with many lanugages and I think is has a purpose - like languages' semantic
diversity. Whats more, you often don't translate things literally even if
they are just translations.


 you probably will have separate content for that reason, AND another.
 Different cultures are interested in different things and only have common
 ground on cetain things like international news (but with different
 opinions) and medical news. So different content for differnt cultures
 speaking different languages.


I need to treat each culture separetly regarding the subject of query.


 Are you tryihg to address differnt languages in some place like the US or
 Great Britain, with LOTS of different languages spoken in minority cultures?
 Only then would you want a geographically centered server and information
 gathering organization. If you were going to have search for other
 countries, then I'd recommend those resources be geogrpahically close to
 their source culture.


No I am not trying to address miniority cultures.

Thanks for answer,
Jakub Godawa.

Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make them
 yourself. from '
 http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

 EARTH has a Right To Life,
  otherwise we all die.


 --- On Wed, 10/20/10, Jakub Godawa jakub.god...@gmail.com wrote:

  From: Jakub Godawa jakub.god...@gmail.com
  Subject: Step by step tutorial for multi-language indexing and search
  To: solr-user@lucene.apache.org
  Date: Wednesday, October 20, 2010, 6:03 AM
  Hi everyone! (my first post)
 
  I am new, but really curious about usefullness of
  lucene/solr in documents
  search from the web applications. I use Ruby on Rails to
  create one, with
  plugin acts_as_solr_reloaded that makes connection
  between web app and
  solr easy.
 
  So I am in a point, where I know that good solution is to
  prepare
  multi-language documents with fields like:
  question_en, answer_en,
  question_fr, answer_fr,
  question_pl,  answer_pl... etc.
 
  I need to create an index that would work with 6 languages:
  english, french,
  german, russian, ukrainian and polish.
 
  My questions are:
  1. Is it doable to have just one search field that behaves
  like Google's for
  all those documents? It can be an option to indicate a
  language to search.
  2. How should I begin changing the solr/conf/schema.xml (or
  other) file to
  tailor it to my needs? As I am a real rookie here, I am
  still a bit confused
  about fields, fieldTypes and their connection with
  particular field (ex.
  answer_fr) and the tokenizers and analyzers. If someone
  can provide a
  basic step by step tutorial on how to make it work in two
  languages I would
  be more that happy.
  3. Do all those languages are supported
  (officially/unofficialy) by
  lucene/solr?
 
  Thank you for help,
  Jakub Godawa.
 



Mulitple facet - fq

2010-10-20 Thread Yavuz Selim YILMAZ
Under category facet, there are multiple selections, whicih can be
personal,corporate or other 

How can I get both personal and corporate ones, I tried
fq=category:corporatefq=category:personal

It looks easy, but I can't find the solution.


--

Yavuz Selim YILMAZ


London open-source search social - 28th Nov - NEW VENUE

2010-10-20 Thread Richard Marr
Hi all,

We've booked a London Search Social for Thursday the 28th Sept. Come
along if you fancy geeking out about search and related technology
over a beer.

Please note that we're not meeting in the same place as usual. Details
on the meetup page.
http://www.meetup.com/london-search-social/

Rich


Re: London open-source search social - 28th Nov - NEW VENUE

2010-10-20 Thread Richard Marr
Wow, apologies for utter stupidity. Both subject line and body should
have read 28th OCT.



On 20 October 2010 15:42, Richard Marr richard.m...@gmail.com wrote:
 Hi all,

 We've booked a London Search Social for Thursday the 28th Sept. Come
 along if you fancy geeking out about search and related technology
 over a beer.

 Please note that we're not meeting in the same place as usual. Details
 on the meetup page.
 http://www.meetup.com/london-search-social/

 Rich




-- 
Richard Marr


Re: Mulitple facet - fq

2010-10-20 Thread Markus Jelsma
It should work fine. Make sure the field is indexed and check your index.

On Wednesday 20 October 2010 16:39:03 Yavuz Selim YILMAZ wrote:
 Under category facet, there are multiple selections, whicih can be
 personal,corporate or other 
 
 How can I get both personal and corporate ones, I tried
 fq=category:corporatefq=category:personal
 
 It looks easy, but I can't find the solution.
 
 
 --
 
 Yavuz Selim YILMAZ

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread bbarani

Hi,

I have a very common question but couldnt find any post related to my
question in this forum,

I am currently initiating a full import each week but the data that have
been deleted in the source is not update in my document as I am using
clean=false.

We are indexing multiple data by data types hence cant delete the index and
do a complete re-indexing each week also we want to delete the orphan solr
documents (for which the data is not present in back end DB) on a daily
basis.

Now my question is.. Is there a way I can use preImportDeleteQuery to delete
the documents from SOLR for which the data doesnt exist in back end db? I
dont have anything called delete status in DB, instead I need to get all the
UID's from SOLR document and compare it with all the UID's in back end and
delete the data from SOLR document for the UID's which is not present in DB.

Any suggestion / ideas would be of great help.

Note: Currently I have developed a simple program which will fetch the UID's
from SOLR document and then connect to backend DB to check the orphan UID's
and delete the documents from SOLR index corresponding to orphan UID's. I
just dont want to re-invent the wheel if this feature is already present in
SOLR as I need to do more testing in terms of performance / scalability for
my program..

Thanks,
Barani


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739222.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread Ezequiel Calderara
Can't you in each delete of that data, save the ids in other table?
And then process those ids against solr to delete them?
On Wed, Oct 20, 2010 at 11:51 AM, bbarani bbar...@gmail.com wrote:


 Hi,

 I have a very common question but couldnt find any post related to my
 question in this forum,

 I am currently initiating a full import each week but the data that have
 been deleted in the source is not update in my document as I am using
 clean=false.

 We are indexing multiple data by data types hence cant delete the index and
 do a complete re-indexing each week also we want to delete the orphan solr
 documents (for which the data is not present in back end DB) on a daily
 basis.

 Now my question is.. Is there a way I can use preImportDeleteQuery to
 delete
 the documents from SOLR for which the data doesnt exist in back end db? I
 dont have anything called delete status in DB, instead I need to get all
 the
 UID's from SOLR document and compare it with all the UID's in back end and
 delete the data from SOLR document for the UID's which is not present in
 DB.

 Any suggestion / ideas would be of great help.

 Note: Currently I have developed a simple program which will fetch the
 UID's
 from SOLR document and then connect to backend DB to check the orphan UID's
 and delete the documents from SOLR index corresponding to orphan UID's. I
 just dont want to re-invent the wheel if this feature is already present in
 SOLR as I need to do more testing in terms of performance / scalability for
 my program..

 Thanks,
 Barani


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739222.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
__
Ezequiel.

Http://www.ironicnet.com


Re: Mulitple facet - fq

2010-10-20 Thread Pradeep Singh
fq=(category:corporate category:personal)

On Wed, Oct 20, 2010 at 7:39 AM, Yavuz Selim YILMAZ yvzslmyilm...@gmail.com
 wrote:

 Under category facet, there are multiple selections, whicih can be
 personal,corporate or other 

 How can I get both personal and corporate ones, I tried
 fq=category:corporatefq=category:personal

 It looks easy, but I can't find the solution.


 --

 Yavuz Selim YILMAZ



Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread bbarani

ironicnet,

Thanks for your reply.

We actually use virtual DB modelling tool to fetch the data from various
sources during run time hence we dont have any control over the source. 

We consolidate the data from more than one source and index the consolidated
data using SOLR. We dont have any kind of update / access rights to source
data.

Thanks.
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739642.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread Mike Sokolov
Since you are performing a complete reload of all of your data, I don't 
understand why you can't create a new core, load your new data, swap 
your application to look at the new core, and then erase the old one, if 
you want.


Even so, you could track the timestamps on all your documents, which 
will be updated when you update the content.  Then when you're done you 
could delete anything with a timestamp prior to the time you started the 
latest import.


-Mike

On 10/20/2010 11:59 AM, bbarani wrote:

ironicnet,

Thanks for your reply.

We actually use virtual DB modelling tool to fetch the data from various
sources during run time hence we dont have any control over the source.

We consolidate the data from more than one source and index the consolidated
data using SOLR. We dont have any kind of update / access rights to source
data.

Thanks.
Barani
   


Re: Spatial

2010-10-20 Thread Pradeep Singh
Thanks for your response Grant.

I already have the bounding box based implementation in place. And on a
document base of around 350K it is super fast.

What about a document base of millions of documents? While a tier based
approach will narrow down the document space significantly this concern
might be misplaced because there are other numeric range queries I am going
to run anyway which don't have anything to do with spatial query. But the
keyword here is numeric range query based on NumericField, which is going to
be significantly faster than regular number based queries. I see that the
dynamic field type _latLon is of type double and not tdouble by default. Can
I have your input about that decision?

-Pradeep

On Tue, Oct 19, 2010 at 6:10 PM, Grant Ingersoll gsing...@apache.orgwrote:


 On Oct 19, 2010, at 6:23 PM, Pradeep Singh wrote:

  https://issues.apache.org/jira/browse/LUCENE-2519
 
  If I change my code as per 2519
 
  to have this  -
 
  public double[] coords(double latitude, double longitude) {
 double rlat = Math.toRadians(latitude);
 double rlong = Math.toRadians(longitude);
 double nlat = rlong * Math.cos(rlat);
 return new double[]{nlat, rlong};
 
   }
 
 
  return this -
 
  x = (gamma - gamma[0]) cos(phi)
  y = phi
 
  would it make it give correct results? Correct projections, tier ids?

 I'm not sure.  I have a lot of doubt around that code.  After making that
 correction, I spent several days trying to get the tests to pass and
 ultimately gave up.  Does that mean it is wrong?  I don't know.  I just
 don't have enough confidence to recommend it given that the tests I were
 asking it to do I could verify through other tools.  Personally, I would
 recommend seeing if one of the non-tier based approaches suffices for your
 situation and use that.

 -Grant


RE: Mulitple facet - fq

2010-10-20 Thread Tim Gilbert
As Prasad said:

fq=(category:corporate category:personal)

But you might want to check your schema.xml to see what you have here:

!-- SolrQueryParser configuration: defaultOperator=AND|OR --
solrQueryParser defaultOperator=AND /

You can always specify your operator in your search between your facets.


fq=(category:corporate AND category:personal)

or

fq=(category:corporate OR category:personal)

I have an application where I am using searches on 10 more facets with
AND OR + and - options and it works flawlessly.

fq=(+category:corporate AND -category:personal)

meaning category is corporate and not personal.

Tim

-Original Message-
From: Pradeep Singh [mailto:pksing...@gmail.com] 
Sent: Wednesday, October 20, 2010 11:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Mulitple facet - fq

fq=(category:corporate category:personal)

On Wed, Oct 20, 2010 at 7:39 AM, Yavuz Selim YILMAZ
yvzslmyilm...@gmail.com
 wrote:

 Under category facet, there are multiple selections, whicih can be
 personal,corporate or other 

 How can I get both personal and corporate ones, I tried
 fq=category:corporatefq=category:personal

 It looks easy, but I can't find the solution.


 --

 Yavuz Selim YILMAZ



RE: Mulitple facet - fq

2010-10-20 Thread Tim Gilbert
Sorry, what Pradeep said, not Prasad.  My apologies Pradeep.

-Original Message-
From: Tim Gilbert 
Sent: Wednesday, October 20, 2010 12:18 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Mulitple facet - fq

As Prasad said:

fq=(category:corporate category:personal)

But you might want to check your schema.xml to see what you have here:

!-- SolrQueryParser configuration: defaultOperator=AND|OR --
solrQueryParser defaultOperator=AND /

You can always specify your operator in your search between your facets.


fq=(category:corporate AND category:personal)

or

fq=(category:corporate OR category:personal)

I have an application where I am using searches on 10 more facets with
AND OR + and - options and it works flawlessly.

fq=(+category:corporate AND -category:personal)

meaning category is corporate and not personal.

Tim

-Original Message-
From: Pradeep Singh [mailto:pksing...@gmail.com] 
Sent: Wednesday, October 20, 2010 11:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Mulitple facet - fq

fq=(category:corporate category:personal)

On Wed, Oct 20, 2010 at 7:39 AM, Yavuz Selim YILMAZ
yvzslmyilm...@gmail.com
 wrote:

 Under category facet, there are multiple selections, whicih can be
 personal,corporate or other 

 How can I get both personal and corporate ones, I tried
 fq=category:corporatefq=category:personal

 It looks easy, but I can't find the solution.


 --

 Yavuz Selim YILMAZ



EmbeddedSolrServer with one core and schema.xml loaded via ClassLoader, is it possible?

2010-10-20 Thread Paolo Castagna

Hi,
I am trying to use EmbeddedSolrServer with just one core and I'd like to
load solrconfig.xml, schema.xml and other configuration files from a jar
via getResourceAsStream(...).

I've tried to use SolrResourceLoader, but all my attempts failed with a
RuntimeException: Can't find resource [...].

Is it possible to construct an EmbeddedSolrServer loading all the config
files from a jar file?

Thank you in advance for your help,
Paolo


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread Shawn Heisey

On 10/20/2010 9:59 AM, bbarani wrote:

We actually use virtual DB modelling tool to fetch the data from various
sources during run time hence we dont have any control over the source.

We consolidate the data from more than one source and index the consolidated
data using SOLR. We dont have any kind of update / access rights to source
data.


It seems likely that those who are in control of the data sources would 
be maintaining some kind of delete log, and that they should be able to 
make those logs available to you.


For my index, the data comes from a MySQL database.  When a delete is 
done at the database level, a database trigger records the old 
information to a main delete log table, as well as a separate table for 
the search system.  The build system uses that separate table to run 
deletes every ten minutes and keeps it trimmed to 48 hours of delete 
history.





Re: Announcing Blaze - Appliance for Solr

2010-10-20 Thread Initcron Labs
 oh, I guess they have just uploaded it... when I've checked the file list
 was empty :)


Yes. Upload is still in progress.

Currently all formats are on the Suse Gallery page. On the Sourceforge I
have managed to upload four formats now including live CD,  preload CD,
HDD/USB image and ovf format.  Two more formats to go for xen and
vmware/virtualbox. Visit Sourceforce page in a few hours and you'll see all
files. Thanks for patience.

Is this free or do you charge for it?


This is both libre http://en.wikipedia.org/wiki/Gratis_versus_Libre and
gratis :)  Feel free to use, share, modify as you like as long as you adhere
to the license solr, tomcat comes with.

And please please...please  give us feedback and suggestions on what
you would like to see added to this appliance.  As a next step we are
thinking of including aja-solr, a very neat ajax based  user interface for
solr.


here are the download pages again for your reference,

Suse Studio :  http://susegallery.com/a/Kr7Ayv/blaze-appliance-for-solr
Sourceforge.net :  http://sourceforge.net/projects/blazeappliance/files/
Do check a Cool Prezi at : http://www.initcron.org/blaze/

Thanks
Initcron Labs






 Cheers,
 Stefan

 Am 20.10.2010 13:03, schrieb Initcron Labs:

  Initcron Labs Announces Blaze - Appliance for Solr .

 Read more at and download from :  http://www.initcron.org/blaze

 Blaze is a tailor made appliance  preinstalled and preconfigured with
 Apache
 Solr  running within Tomcat servlet  container. It  lets you focus on
 developing applications based on Apache Solr platform  and not worry
 about
 installation, configuration complexities.

 Blaze Appliance is built with Suse Studio and is available in following
 formats

 - LiveCD
 - USB Drive/ HDD Image
 - Preload ISO
 - Virtual Machine Images
 - Xen
 - VMWare, Virtualbox
 - OVM Open Format
 - Amazon EC2 Image Format


 You could get your solr installation setup and running within minutes.
 The appliance is also production ready being configured with Tomcat.
 Comes
 with webyast for web administration and configuration of the appliance.



 Thanks

 Initcron Labs

 www.initcron.org
















































  *
 *


  --
 ***
 Stefan Moises
 Senior Softwareentwickler

 shoptimax GmbH
 Guntherstraße 45 a
 90461 Nürnberg
 Amtsgericht Nürnberg HRB 21703
 GF Friedrich Schreieck

 Tel.: 0911/25566-25
 Fax:  0911/25566-29
 moi...@shoptimax.de
 http://www.shoptimax.de
 ***



 --
 ***
 Stefan Moises
 Senior Softwareentwickler

 shoptimax GmbH
 Guntherstraße 45 a
 90461 Nürnberg
 Amtsgericht Nürnberg HRB 21703
 GF Friedrich Schreieck

 Tel.: 0911/25566-25
 Fax:  0911/25566-29
 moi...@shoptimax.de
 http://www.shoptimax.de
 ***




Multiple Similarity

2010-10-20 Thread raimon.bosch


Hi,

Is it possible to define different Similarity classes for different fields?
We have a use case where we are interested in avoid term frequency (tf) when
our fields are multiValued.

Regards,
Raimon Bosch.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Similarity-tp1740290p1740290.html
Sent from the Solr - User mailing list archive at Nabble.com.


Searching for Documents by Indexed Term

2010-10-20 Thread Sasank Mudunuri
Hi Solr Users,

I used the TermsComponent to walk through all the indexed terms and find
ones of particular interest (named entities). And now, I'd like to search
for documents that contain these particular entities. I have both query-time
and index-time stemming set for the field, which means I can't just hit the
normal search handler because as I understand, it will stem the
already-stemmed term. Any ideas about how to search directly for the indexed
term? Maybe something I can do at query-time to disable stemming?

Thanks!
sasank


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread Ezequiel Calderara
Also you can set an expiration policy maybe, and delete files that expire
after some time and aren't older than other... but i don't know if you can
iterate over the existing ids...

On Wed, Oct 20, 2010 at 1:34 PM, Shawn Heisey s...@elyograg.org wrote:

 On 10/20/2010 9:59 AM, bbarani wrote:

 We actually use virtual DB modelling tool to fetch the data from various
 sources during run time hence we dont have any control over the source.

 We consolidate the data from more than one source and index the
 consolidated
 data using SOLR. We dont have any kind of update / access rights to source
 data.


 It seems likely that those who are in control of the data sources would be
 maintaining some kind of delete log, and that they should be able to make
 those logs available to you.

 For my index, the data comes from a MySQL database.  When a delete is done
 at the database level, a database trigger records the old information to a
 main delete log table, as well as a separate table for the search system.
  The build system uses that separate table to run deletes every ten minutes
 and keeps it trimmed to 48 hours of delete history.





-- 
__
Ezequiel.

Http://www.ironicnet.com


Sorting and filtering on fluctuating multi-currency price data?

2010-10-20 Thread Gregg Donovan
In our current search app, we have sorting and filtering based on item
prices. We'd like to extend this to support sorting and filtering in the
buyer's native currency with the items themselves listed in the seller's
native currency. E.g: as a buyer, if my native currency is the Euro, my
search of all items between 10 and 20 Euros would also find all items listed
in USD between 13.90 and 27.80, in CAD between 14.29 and 28.58, etc.

I wanted to run a few possible approaches by the list to see if we were on
the right track or not. Our index is updated every few minutes, but we only
update our currency conversions every few hours.

The easiest approach would be to update the documents with non-USD listings
every few hours with the USD-converted price. That will be fine, but if the
number of non-USD listings is large, this would be too expensive (i.e. large
parts of the index getting recreated frequently).

Another approach would be to use ExternalFileField and keep the price data,
normalized to USD, outside of the index. Every time the currency rates
changed, we would calculate new normalized prices for every document in the
index.

Still another approach would be to do the currency conversion at IndexReader
warmup time. We would index native price and currency code and create a
normalized currency field on the fly. This would be somewhat like
ExternalFileField in that it involved data from outside the index, but it
wouldn't need to be scoped to the parent SolrIndexReader, but could be
per-segment. Perhaps a custom poly-field could accomplish something like
this?

Has anyone dealt with this sort of problem? Do any of these approaches sound
more or less reasonable? Are we missing anything?

Thanks for the help!

Gregg Donovan
Technical Lead, Search
Etsy.com


Re: Step by step tutorial for multi-language indexing and search

2010-10-20 Thread Pradeep Singh
Here's what I would do -

Search all the fields everytime regardless of language. Use one handler and
specify all of these in qf and pf.
question_en, answer_en,
question_fr, answer_fr,
question_pl,  answer_pl

Individual field based analyzers will take care of appropriate tokenization
and you will get a match across all languages.

Even with this setup if you wanted you could also have a separate field
called language and use a fq to limit searches to that language only.

-Pradeep

On Wed, Oct 20, 2010 at 6:03 AM, Jakub Godawa jakub.god...@gmail.comwrote:

 Hi everyone! (my first post)

 I am new, but really curious about usefullness of lucene/solr in documents
 search from the web applications. I use Ruby on Rails to create one, with
 plugin acts_as_solr_reloaded that makes connection between web app and
 solr easy.

 So I am in a point, where I know that good solution is to prepare
 multi-language documents with fields like:
 question_en, answer_en,
 question_fr, answer_fr,
 question_pl,  answer_pl... etc.

 I need to create an index that would work with 6 languages: english,
 french,
 german, russian, ukrainian and polish.

 My questions are:
 1. Is it doable to have just one search field that behaves like Google's
 for
 all those documents? It can be an option to indicate a language to search.
 2. How should I begin changing the solr/conf/schema.xml (or other) file to
 tailor it to my needs? As I am a real rookie here, I am still a bit
 confused
 about fields, fieldTypes and their connection with particular field
 (ex.
 answer_fr) and the tokenizers and analyzers. If someone can provide a
 basic step by step tutorial on how to make it work in two languages I would
 be more that happy.
 3. Do all those languages are supported (officially/unofficialy) by
 lucene/solr?

 Thank you for help,
 Jakub Godawa.



Multiple indexes inside a single core

2010-10-20 Thread ben boggess
We are trying to convert a Lucene-based search solution to a
Solr/Lucene-based solution.  The problem we have is that we currently have
our data split into many indexes and Solr expects things to be in a single
index unless you're sharding.  In addition to this, our indexes wouldn't
work well using the distributed search functionality in Solr because the
documents are not evenly or randomly distributed.  We are currently using
Lucene's MultiSearcher to search over subsets of these indexes.

I know this has been brought up a number of times in previous posts and the
typical response is that the best thing to do is to convert everything into
a single index.  One of the major reasons for having the indexes split up
the way we do is because different types of data need to be indexed at
different intervals.  You may need one index to be updated every 20 minutes
and another is only updated every week.  If we move to a single index, then
we will constantly be warming and replacing searchers for the entire
dataset, and will essentially render the searcher caches useless.  If we
were able to have multiple indexes, they would each have a searcher and
updates would be isolated to a subset of the data.

The other problem is that we will likely need to shard this large single
index and there isn't a clean way to shard randomly and evenly across the of
the data.  We would, however like to shard a single data type.  If we could
use multiple indexes, we would likely be also sharding a small sub-set of
them.

Thanks in advance,

Ben


xpath processing

2010-10-20 Thread pghorpade


I am trying to import mods xml data in solr using  the xml/http datasource

This does not work with XPathEntityProcessor of the data import handler
xpath=/mods/name/namepa...@type = 'date']

I actually have 143 records with type attribute as 'date' for element  
namePart.


Thank you
Parinita


Re: Dismax phrase boosts on multi-value fields

2010-10-20 Thread Erick Erickson
Well, it all depends (tm). your example wouldn't match, but if you
didn't have an increment gap greater than 1, black cat his blue #would#
match.

Best
Erick


On Wed, Oct 20, 2010 at 3:22 AM, Jason Brown jason.br...@sjp.co.uk wrote:

 Thanks Jonathan.

 To further clarify, I understand the the match of

 my blue rabbit

 would have to be found in 1 element (of my multi-valued defined field) for
 the phrase boost on that field to kick in.

 If for example my document had the following 3 entries for the multi-value
 field


 my black cat
 his blue car
 her pink rabbit

 Then I assume the phrase boost would not kick-in as the search term (my
 blue rabbit) isnt found in a single element (but can be found across them).

 Thanks again

 Jason.

 

 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: Tue 19/10/2010 17:27
 To: solr-user@lucene.apache.org
 Subject: Re: Dismax phrase boosts on multi-value fields



 You are correct.  The query needs to match as a phrase. It doesn't need
 to match everything. Note that if a value is:

 long sentence with my blue rabbit in it,

 then query my blue rabbit will also match as a phrase, for phrase
 boosting or query purposes.

 Jonathan

 Jason Brown wrote:
 
 
  Hi - I have a multi-value field, so say for example it consists of
 
  'my black cat'
  'my white dog'
  'my blue rabbit'
 
  The field is whitespace parsed when put into the index.
 
  I have a phrase query boost configured on this field which I understand
 kicks in when my search term is found entirely in this field.
 
  So, if the search term is 'my blue rabbit', then I understand that my
 phrase boost will be applied as this is found entirley in this field.
 
  My question/presumption is that as this is a multi-valued field, only 1
 value of the multi-value needs to match for the phrase query boost (given my
 very imaginative set of test data :-) above, you can see that this obviously
 matches 1 value and not them all)
 
  Thanks for your help.
 
 
 
 
 
 
  If you wish to view the St. James's Place email disclaimer, please use
 the link below
 
  http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
 
 



 If you wish to view the St. James's Place email disclaimer, please use the
 link below

 http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer



Re: Searching with Number fields

2010-10-20 Thread Erick Erickson
I don't see anything obvious. Try going to the admin page and click the
analysis link. That'll let you see pretty much exactly how things get
parsed both for indexing and querying.

Unless your synonyms are somehow getting in the way, but I don't
see how.

Best
Erick

On Wed, Oct 20, 2010 at 5:15 AM, Hasnain hasn...@hotmail.com wrote:


 Hi,

   Im having trouble with searching with number fields, if this field has
 alphanumerics then search is working perfect but not with all numbers, can
 anyone suggest me  solution???

 fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/


filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Searching-with-Number-fields-tp1737513p1737513.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Searching for Documents by Indexed Term

2010-10-20 Thread Erick Erickson
This may be a wild herring, but have you tried raw? NOTE: I'm a little
out of my depth here on what this actually does, so don't waste time by
thinking I'm an authority on this one. See:
http://lucene.apache.org/solr/api/org/apache/solr/search/RawQParserPlugin.html

and
http://wiki.apache.org/solr/SolrQuerySyntax
(this last under built in query parsers).

HTH
Erick

On Wed, Oct 20, 2010 at 1:47 PM, Sasank Mudunuri sas...@gmail.com wrote:

 Hi Solr Users,

 I used the TermsComponent to walk through all the indexed terms and find
 ones of particular interest (named entities). And now, I'd like to search
 for documents that contain these particular entities. I have both
 query-time
 and index-time stemming set for the field, which means I can't just hit the
 normal search handler because as I understand, it will stem the
 already-stemmed term. Any ideas about how to search directly for the
 indexed
 term? Maybe something I can do at query-time to disable stemming?

 Thanks!
 sasank



Re: How can i get collect stemmed query?

2010-10-20 Thread Jerad

Thank you very much~! I'll try it :)


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1742898.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread Erick Erickson
We are indexing multiple data by data types hence cant delete the index
and
do a complete re-indexing each week also we want to delete the orphan solr
documents (for which the data is not present in back end DB) on a daily
basis.

Can you make delete by query work? Something like delete all Solr docs of
a certain type and do a full re-index of just that type?

I have no idea whether this is practical or not

But your solution also works. There's really no way Solr #can# know about
deleted database records, especially since the uniqueKey field is
completely
arbitrarily defined.

Best
Erick

On Wed, Oct 20, 2010 at 10:51 AM, bbarani bbar...@gmail.com wrote:


 Hi,

 I have a very common question but couldnt find any post related to my
 question in this forum,

 I am currently initiating a full import each week but the data that have
 been deleted in the source is not update in my document as I am using
 clean=false.

 We are indexing multiple data by data types hence cant delete the index and
 do a complete re-indexing each week also we want to delete the orphan solr
 documents (for which the data is not present in back end DB) on a daily
 basis.

 Now my question is.. Is there a way I can use preImportDeleteQuery to
 delete
 the documents from SOLR for which the data doesnt exist in back end db? I
 dont have anything called delete status in DB, instead I need to get all
 the
 UID's from SOLR document and compare it with all the UID's in back end and
 delete the data from SOLR document for the UID's which is not present in
 DB.

 Any suggestion / ideas would be of great help.

 Note: Currently I have developed a simple program which will fetch the
 UID's
 from SOLR document and then connect to backend DB to check the orphan UID's
 and delete the documents from SOLR index corresponding to orphan UID's. I
 just dont want to re-invent the wheel if this feature is already present in
 SOLR as I need to do more testing in terms of performance / scalability for
 my program..

 Thanks,
 Barani


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739222.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Searching for Documents by Indexed Term

2010-10-20 Thread Sasank Mudunuri
That looks very promising based on a couple of quick queries. Any objections
if I move the javadoc help into the wiki, specifically:

Create a term query from the input value without any text analysis or
 transformation whatsoever. This is useful in debugging, or when raw terms
 are returned from the terms component (this is not the default).


Thanks Eric!
sasank

On Wed, Oct 20, 2010 at 6:00 PM, Erick Erickson erickerick...@gmail.comwrote:

 This may be a wild herring, but have you tried raw? NOTE: I'm a little
 out of my depth here on what this actually does, so don't waste time by
 thinking I'm an authority on this one. See:

 http://lucene.apache.org/solr/api/org/apache/solr/search/RawQParserPlugin.html

 and
 http://wiki.apache.org/solr/SolrQuerySyntax
 (this last under built in query parsers).

 HTH
 Erick

 On Wed, Oct 20, 2010 at 1:47 PM, Sasank Mudunuri sas...@gmail.com wrote:

  Hi Solr Users,
 
  I used the TermsComponent to walk through all the indexed terms and find
  ones of particular interest (named entities). And now, I'd like to search
  for documents that contain these particular entities. I have both
  query-time
  and index-time stemming set for the field, which means I can't just hit
 the
  normal search handler because as I understand, it will stem the
  already-stemmed term. Any ideas about how to search directly for the
  indexed
  term? Maybe something I can do at query-time to disable stemming?
 
  Thanks!
  sasank
 



Re: Step by step tutorial for multi-language indexing and search

2010-10-20 Thread Erick Erickson
See below:

But also search the archives for multilanguage, this topic has been
discussed
many times before. Lucid Imagination maintains a Solr-powered (of course)
searchable
list at: http://www.lucidimagination.com/search/

http://www.lucidimagination.com/search/

On Wed, Oct 20, 2010 at 9:03 AM, Jakub Godawa jakub.god...@gmail.comwrote:

 Hi everyone! (my first post)

 I am new, but really curious about usefullness of lucene/solr in documents
 search from the web applications. I use Ruby on Rails to create one, with
 plugin acts_as_solr_reloaded that makes connection between web app and
 solr easy.

 So I am in a point, where I know that good solution is to prepare
 multi-language documents with fields like:
 question_en, answer_en,
 question_fr, answer_fr,
 question_pl,  answer_pl... etc.

 I need to create an index that would work with 6 languages: english,
 french,
 german, russian, ukrainian and polish.

 My questions are:
 1. Is it doable to have just one search field that behaves like Google's
 for
 all those documents? It can be an option to indicate a language to search.


This depends on what you mean by do-able. Are you going to allow a French
user to search an English document ( etc)? But the real answer is yes, you
can
if you .. There'll be tradeoffs.

Take a look at the dismax handler. It's kind of hard to grok all at once,
but you
can cause it to search across multiple fields. That is, the user types
language,
and you can turn it into a complex query under the covers like
lang_en:language lang_fr:language lang_ru:language, etc. You can also
apply boosts. Note that this has obvious problems with, say, Russian. Half
your
job will be figuring out what will satisfy the user.

You could also have a #different# dismax handler defined for various
languages. Say
the user was coming from Spanish. Consider a browseES handler. See
solrconfig.xml
for the default dismax handler. The Solr book mentioned above describes
this.


 2. How should I begin changing the solr/conf/schema.xml (or other) file to
 tailor it to my needs? As I am a real rookie here, I am still a bit
 confused
 about fields, fieldTypes and their connection with particular field
 (ex.
 answer_fr) and the tokenizers and analyzers. If someone can provide a
 basic step by step tutorial on how to make it work in two languages I would
 be more that happy.


You have several choices here:
 books Lucene in Action and Solr 1.4, Enterprise SearchServer both have
discussions here.
 Spend some time on the solr/admin/analysis page. That page allows you to
see
   pretty much exactly what each of the steps in an analyzer chain
accomplish.


 3. Do all those languages are supported (officially/unofficialy) by
 lucene/solr?


See:
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/analysis/Analyzer.html
Remember that Solr is built on Lucene, so these analyzers are available.



 Thank you for help,
 Jakub Godawa.


Best
Erick


RE: Dismax phrase boosts on multi-value fields

2010-10-20 Thread Jonathan Rochkind
Which is why the positionIncrementGap is set to a high number normally (100 in 
the sample schema.xml).  With this being so, phrases won't match accross values 
in a multi-valued field. If for some reason you were using a dismax ps phrase 
slop that was higher than your positionIncrementGap, you could get phrase boost 
matches accross individual values.  But normally that won't happen unless you 
do something odd to make it happen because you actually want it to, because 
positionIncrementGap is 100. If for some reason you wanted to use a phrase slop 
of over 100 but still make sure it didn't go accross individual value 
boundaries you could just set positionIncrementGap to something absurdly high 
(I'm not entirely sure why it isn't something absurdly high in the sample 
schema.xml, instead of the high-but-not-absurdly-so 100, since most people will 
probably expect individual values to be entirely seperate). 

Jason, are you _trying_ to make that happen, or hoping it won't?  Ordinarily, 
it won't. 

From: Erick Erickson [erickerick...@gmail.com]
Sent: Wednesday, October 20, 2010 7:11 PM
To: solr-user@lucene.apache.org
Subject: Re: Dismax phrase boosts on multi-value fields

Well, it all depends (tm). your example wouldn't match, but if you
didn't have an increment gap greater than 1, black cat his blue #would#
match.

Best
Erick


On Wed, Oct 20, 2010 at 3:22 AM, Jason Brown jason.br...@sjp.co.uk wrote:

 Thanks Jonathan.

 To further clarify, I understand the the match of

 my blue rabbit

 would have to be found in 1 element (of my multi-valued defined field) for
 the phrase boost on that field to kick in.

 If for example my document had the following 3 entries for the multi-value
 field


 my black cat
 his blue car
 her pink rabbit

 Then I assume the phrase boost would not kick-in as the search term (my
 blue rabbit) isnt found in a single element (but can be found across them).

 Thanks again

 Jason.

 

 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: Tue 19/10/2010 17:27
 To: solr-user@lucene.apache.org
 Subject: Re: Dismax phrase boosts on multi-value fields



 You are correct.  The query needs to match as a phrase. It doesn't need
 to match everything. Note that if a value is:

 long sentence with my blue rabbit in it,

 then query my blue rabbit will also match as a phrase, for phrase
 boosting or query purposes.

 Jonathan

 Jason Brown wrote:
 
 
  Hi - I have a multi-value field, so say for example it consists of
 
  'my black cat'
  'my white dog'
  'my blue rabbit'
 
  The field is whitespace parsed when put into the index.
 
  I have a phrase query boost configured on this field which I understand
 kicks in when my search term is found entirely in this field.
 
  So, if the search term is 'my blue rabbit', then I understand that my
 phrase boost will be applied as this is found entirley in this field.
 
  My question/presumption is that as this is a multi-valued field, only 1
 value of the multi-value needs to match for the phrase query boost (given my
 very imaginative set of test data :-) above, you can see that this obviously
 matches 1 value and not them all)
 
  Thanks for your help.
 
 
 
 
 
 
  If you wish to view the St. James's Place email disclaimer, please use
 the link below
 
  http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
 
 



 If you wish to view the St. James's Place email disclaimer, please use the
 link below

 http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer



Re: Multiple indexes inside a single core

2010-10-20 Thread Erick Erickson
It seems to me that multiple cores are along the lines you
need, a single instance of Solr that can search across multiple
sub-indexes that do not necessarily share schemas, and are
independently maintainable..

This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin

HTH
Erick

On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote:

 We are trying to convert a Lucene-based search solution to a
 Solr/Lucene-based solution.  The problem we have is that we currently have
 our data split into many indexes and Solr expects things to be in a single
 index unless you're sharding.  In addition to this, our indexes wouldn't
 work well using the distributed search functionality in Solr because the
 documents are not evenly or randomly distributed.  We are currently using
 Lucene's MultiSearcher to search over subsets of these indexes.

 I know this has been brought up a number of times in previous posts and the
 typical response is that the best thing to do is to convert everything into
 a single index.  One of the major reasons for having the indexes split up
 the way we do is because different types of data need to be indexed at
 different intervals.  You may need one index to be updated every 20 minutes
 and another is only updated every week.  If we move to a single index, then
 we will constantly be warming and replacing searchers for the entire
 dataset, and will essentially render the searcher caches useless.  If we
 were able to have multiple indexes, they would each have a searcher and
 updates would be isolated to a subset of the data.

 The other problem is that we will likely need to shard this large single
 index and there isn't a clean way to shard randomly and evenly across the
 of
 the data.  We would, however like to shard a single data type.  If we could
 use multiple indexes, we would likely be also sharding a small sub-set of
 them.

 Thanks in advance,

 Ben



Re: Searching for Documents by Indexed Term

2010-10-20 Thread Erick Erickson
Help updating/clarifying the Wiki is #alwyas# appreciated

Erick

On Wed, Oct 20, 2010 at 9:10 PM, Sasank Mudunuri sas...@gmail.com wrote:

 That looks very promising based on a couple of quick queries. Any
 objections
 if I move the javadoc help into the wiki, specifically:

 Create a term query from the input value without any text analysis or
  transformation whatsoever. This is useful in debugging, or when raw terms
  are returned from the terms component (this is not the default).


 Thanks Eric!
 sasank

 On Wed, Oct 20, 2010 at 6:00 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  This may be a wild herring, but have you tried raw? NOTE: I'm a little
  out of my depth here on what this actually does, so don't waste time by
  thinking I'm an authority on this one. See:
 
 
 http://lucene.apache.org/solr/api/org/apache/solr/search/RawQParserPlugin.html
 
  and
  http://wiki.apache.org/solr/SolrQuerySyntax
  (this last under built in query parsers).
 
  HTH
  Erick
 
  On Wed, Oct 20, 2010 at 1:47 PM, Sasank Mudunuri sas...@gmail.com
 wrote:
 
   Hi Solr Users,
  
   I used the TermsComponent to walk through all the indexed terms and
 find
   ones of particular interest (named entities). And now, I'd like to
 search
   for documents that contain these particular entities. I have both
   query-time
   and index-time stemming set for the field, which means I can't just hit
  the
   normal search handler because as I understand, it will stem the
   already-stemmed term. Any ideas about how to search directly for the
   indexed
   term? Maybe something I can do at query-time to disable stemming?
  
   Thanks!
   sasank
  
 



Re: Multiple indexes inside a single core

2010-10-20 Thread Ben Boggess
Thanks Erick.  The problem with multiple cores is that the documents are scored 
independently in each core.  I would like to be able to search across both 
cores and have the scores 'normalized' in a way that's similar to what Lucene's 
MultiSearcher would do.  As far a I understand, multiple cores would likely 
result in seriously skewed scores in my case since the documents are not 
distributed evenly or randomly.  I could have one core/index with 20 million 
docs and another with 200.

I've poked around in the code and this feature doesn't seem to exist.  I would 
be happy with finding a decent place to try to add it.  I'm not sure if there 
is a clean place for it.

Ben

On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote:

 It seems to me that multiple cores are along the lines you
 need, a single instance of Solr that can search across multiple
 sub-indexes that do not necessarily share schemas, and are
 independently maintainable..
 
 This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin
 
 HTH
 Erick
 
 On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote:
 
 We are trying to convert a Lucene-based search solution to a
 Solr/Lucene-based solution.  The problem we have is that we currently have
 our data split into many indexes and Solr expects things to be in a single
 index unless you're sharding.  In addition to this, our indexes wouldn't
 work well using the distributed search functionality in Solr because the
 documents are not evenly or randomly distributed.  We are currently using
 Lucene's MultiSearcher to search over subsets of these indexes.
 
 I know this has been brought up a number of times in previous posts and the
 typical response is that the best thing to do is to convert everything into
 a single index.  One of the major reasons for having the indexes split up
 the way we do is because different types of data need to be indexed at
 different intervals.  You may need one index to be updated every 20 minutes
 and another is only updated every week.  If we move to a single index, then
 we will constantly be warming and replacing searchers for the entire
 dataset, and will essentially render the searcher caches useless.  If we
 were able to have multiple indexes, they would each have a searcher and
 updates would be isolated to a subset of the data.
 
 The other problem is that we will likely need to shard this large single
 index and there isn't a clean way to shard randomly and evenly across the
 of
 the data.  We would, however like to shard a single data type.  If we could
 use multiple indexes, we would likely be also sharding a small sub-set of
 them.
 
 Thanks in advance,
 
 Ben
 


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread ben boggess
 Now my question is.. Is there a way I can use preImportDeleteQuery to
 delete
 the documents from SOLR for which the data doesnt exist in back end db? I
 dont have anything called delete status in DB, instead I need to get all
 the
 UID's from SOLR document and compare it with all the UID's in back end and
 delete the data from SOLR document for the UID's which is not present in
 DB.

I've done something like this with raw Lucene and I'm not sure how or if you
could do it with Solr as I'm relatively new to it.

We stored a timestamp for when we started to import and stored an update
timestamp field for every document added to the index.  After the data
import, we did a delete by query that matched all documents with a timestamp
older than when we started.  The assumption being that if we didn't update
the timestamp during the load, then the record must have been deleted from
the database.

Hope this helps.

Ben

On Wed, Oct 20, 2010 at 8:05 PM, Erick Erickson erickerick...@gmail.comwrote:

 We are indexing multiple data by data types hence cant delete the index
 and
 do a complete re-indexing each week also we want to delete the orphan solr
 documents (for which the data is not present in back end DB) on a daily
 basis.

 Can you make delete by query work? Something like delete all Solr docs of
 a certain type and do a full re-index of just that type?

 I have no idea whether this is practical or not

 But your solution also works. There's really no way Solr #can# know about
 deleted database records, especially since the uniqueKey field is
 completely
 arbitrarily defined.

 Best
 Erick

 On Wed, Oct 20, 2010 at 10:51 AM, bbarani bbar...@gmail.com wrote:

 
  Hi,
 
  I have a very common question but couldnt find any post related to my
  question in this forum,
 
  I am currently initiating a full import each week but the data that have
  been deleted in the source is not update in my document as I am using
  clean=false.
 
  We are indexing multiple data by data types hence cant delete the index
 and
  do a complete re-indexing each week also we want to delete the orphan
 solr
  documents (for which the data is not present in back end DB) on a daily
  basis.
 
  Now my question is.. Is there a way I can use preImportDeleteQuery to
  delete
  the documents from SOLR for which the data doesnt exist in back end db? I
  dont have anything called delete status in DB, instead I need to get all
  the
  UID's from SOLR document and compare it with all the UID's in back end
 and
  delete the data from SOLR document for the UID's which is not present in
  DB.
 
  Any suggestion / ideas would be of great help.
 
  Note: Currently I have developed a simple program which will fetch the
  UID's
  from SOLR document and then connect to backend DB to check the orphan
 UID's
  and delete the documents from SOLR index corresponding to orphan UID's. I
  just dont want to re-invent the wheel if this feature is already present
 in
  SOLR as I need to do more testing in terms of performance / scalability
 for
  my program..
 
  Thanks,
  Barani
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739222.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



RE: Dismax phrase boosts on multi-value fields

2010-10-20 Thread Jason Brown
Thanks - I was hoping it wouldnt match - and I belive you've confimred it wont 
in my case as the default positionIncrementGap is set.

Many Thanks

Jason.


-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Thu 21/10/2010 02:27
To: solr-user@lucene.apache.org
Subject: RE: Dismax phrase boosts on multi-value fields
 
Which is why the positionIncrementGap is set to a high number normally (100 in 
the sample schema.xml).  With this being so, phrases won't match accross values 
in a multi-valued field. If for some reason you were using a dismax ps phrase 
slop that was higher than your positionIncrementGap, you could get phrase boost 
matches accross individual values.  But normally that won't happen unless you 
do something odd to make it happen because you actually want it to, because 
positionIncrementGap is 100. If for some reason you wanted to use a phrase slop 
of over 100 but still make sure it didn't go accross individual value 
boundaries you could just set positionIncrementGap to something absurdly high 
(I'm not entirely sure why it isn't something absurdly high in the sample 
schema.xml, instead of the high-but-not-absurdly-so 100, since most people will 
probably expect individual values to be entirely seperate). 

Jason, are you _trying_ to make that happen, or hoping it won't?  Ordinarily, 
it won't. 

From: Erick Erickson [erickerick...@gmail.com]
Sent: Wednesday, October 20, 2010 7:11 PM
To: solr-user@lucene.apache.org
Subject: Re: Dismax phrase boosts on multi-value fields

Well, it all depends (tm). your example wouldn't match, but if you
didn't have an increment gap greater than 1, black cat his blue #would#
match.

Best
Erick


On Wed, Oct 20, 2010 at 3:22 AM, Jason Brown jason.br...@sjp.co.uk wrote:

 Thanks Jonathan.

 To further clarify, I understand the the match of

 my blue rabbit

 would have to be found in 1 element (of my multi-valued defined field) for
 the phrase boost on that field to kick in.

 If for example my document had the following 3 entries for the multi-value
 field


 my black cat
 his blue car
 her pink rabbit

 Then I assume the phrase boost would not kick-in as the search term (my
 blue rabbit) isnt found in a single element (but can be found across them).

 Thanks again

 Jason.

 

 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: Tue 19/10/2010 17:27
 To: solr-user@lucene.apache.org
 Subject: Re: Dismax phrase boosts on multi-value fields



 You are correct.  The query needs to match as a phrase. It doesn't need
 to match everything. Note that if a value is:

 long sentence with my blue rabbit in it,

 then query my blue rabbit will also match as a phrase, for phrase
 boosting or query purposes.

 Jonathan

 Jason Brown wrote:
 
 
  Hi - I have a multi-value field, so say for example it consists of
 
  'my black cat'
  'my white dog'
  'my blue rabbit'
 
  The field is whitespace parsed when put into the index.
 
  I have a phrase query boost configured on this field which I understand
 kicks in when my search term is found entirely in this field.
 
  So, if the search term is 'my blue rabbit', then I understand that my
 phrase boost will be applied as this is found entirley in this field.
 
  My question/presumption is that as this is a multi-valued field, only 1
 value of the multi-value needs to match for the phrase query boost (given my
 very imaginative set of test data :-) above, you can see that this obviously
 matches 1 value and not them all)
 
  Thanks for your help.
 
 
 
 
 
 
  If you wish to view the St. James's Place email disclaimer, please use
 the link below
 
  http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
 
 



 If you wish to view the St. James's Place email disclaimer, please use the
 link below

 http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer



If you wish to view the St. James's Place email disclaimer, please use the link 
below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer


Looking for Solr/Lucene Developers in India(Pune)

2010-10-20 Thread ST ST
If you are a Solr/Lucene developer in Pune, India and are interested in a
consulting opportunity overseas,
or on Projects local to the area, please get in touch with me.

Thanks


Re: does solr support posting gzipped content?

2010-10-20 Thread Gora Mohanty
On Tue, Oct 19, 2010 at 9:34 PM, danomano dshopk...@earthlink.net wrote:

 Hi folks, I was wondering if there is any native support for posting gzipped
 files to solr?

 i.e. I'm testing a project where we inject our log files into solr for
 indexing, these logs files are gzipped, and I figure it would take less
 network bandwith to inject gzipped files directl.

What do you mean by inject? Are you POSTing XML to Solr,
using a DataImportHandler, or what?

  is there a 
 way to do
 this? other then implementing my own SerlvetFilter or some such.

As far as I am aware there is no existing way to post gzipped
content to Solr. Integrating this into the DataImportHandler would
probably be the way to go.

Regards,
Gora


Re: how can i use solrj binary format for indexing?

2010-10-20 Thread Gora Mohanty
On Mon, Oct 18, 2010 at 8:22 PM, Jason, Kim hialo...@gmail.com wrote:

Sorry for the delay in replying. Was caught up in various things this
week.

 Thank you for reply, Gora

 But I still have several questions.
 Did you use separate index?
 If so, you indexed 0.7 million Xml files per instance
 and merged it. Is it Right?

Yes, that is correct. We sharded the data by user ID, so that each of the 25
cores held approximately 0.7 million out of the 3.5 million records. We could
have used the sharded indices directly for search, but at least for now have
decided to go with a single, merged index.

 Please let me know how to work multiple instances and cores in your case.
[...]

* Multi-core Solr setup is quite easy, via configuration in solr.xml:
  http://wiki.apache.org/solr/CoreAdmin . The configuration, i.e.,
  schema, solrconfig.xml, etc. need to be replicated across the
  cores.
* Decide which XML files you will post to which core, and do the
  POST with curl, as usual. You might need to write a little script
  to do this.
* After indexing on the cores is done, make sure to do a commit
  on each.
* Merge the sharded indexes (if desired) as described here:
  http://wiki.apache.org/solr/MergingSolrIndexes . One thing to
  watch out for here is disk space. When merging with Lucene
  IndexMergeTool, we found that a rough rule of thumb was that
  intermediate steps in the merge would require about twice as
  much space as the total size of the indexes to be merged. I.e.,
  if one is merging 40GB of data in sharded indexes, one should
  have at least 120GB free.

Regards,
Gora


RAM increase

2010-10-20 Thread satya swaroop
Hi all,
  I increased my RAM size to 8GB and i want 4GB of it to be used
for solr itself. can anyone tell me the way to allocate the RAM for the
solr.


Regards,
satya