date:20101203

Re: Batch Update Fields

2010-12-03 Thread Markus Jelsma

You must reindex the complete document, even if you just want to update a 
single field.

On Friday 03 December 2010 04:52:04 Adam Estrada wrote:
 OK part 2 of my previous question...
 
 Is there a way to batch update field values based on a certain criteria?
 For example, if thousands of documents have a field value of 'US' can I
 update all of them to 'United States' programmatically?
 
 Adam

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Limit number of characters returned

2010-12-03 Thread Ahmet Arslan

--- On Fri, 12/3/10, Mark static.void@gmail.com wrote:

 From: Mark static.void@gmail.com
 Subject: Limit number of characters returned
 To: solr-user@lucene.apache.org
 Date: Friday, December 3, 2010, 5:39 AM
 Is there way to limit the number of
 characters returned from a stored field?

 For example:

 Say I have a document (~2K words) and I search for a word
 that's somewhere in the middle. I would like the document to
 match the search query but the stored field should only
 return the first 200 characters of the document. Is there
 anyway to accomplish this that doesn't involve two fields?

I don't think it is possible out-of-the-box. May be you can hack highlighter to 
return that first 200 characters in highlighting response.
Or a custom response writer can do that.

But if you will be always returning first 200 characters of documents, I think 
creating additional field with indexed=false stored=true will be more 
efficient. And you can make your original field indexed=true stored=false, 
your index size will be diminished.

copyField source=text dest=textShort maxChars=200/

Re: [Wildcard query] Weird behaviour

2010-12-03 Thread Robert Muir

On Fri, Dec 3, 2010 at 6:28 AM, Tanguy Moal tanguy.m...@gmail.com wrote:
 However suddenly CPU usage simply doubles, and sometimes eventually
 start using all 16 cores of the server, whereas the number of handled
 request is pretty stable, and even starts decreasing because of
 degraded user experience due to dramatic response times.


Hi Tanguy: This was fixed here:
https://issues.apache.org/jira/browse/LUCENE-2620.

You can apply the patch file there
(https://issues.apache.org/jira/secure/attachment/12452947/LUCENE-2620_3x.patch)
and recompile your own lucene 2.9.x, or you can replace the lucene jar
file in your solr war with the newly released lucene-2.9.4 core jar...
which I think is due to be released later today!

Thanks for spending the time to report the problem... let us know the
patch/lucene 2.9.4 doesnt fix it!

Re: [Wildcard query] Weird behaviour

2010-12-03 Thread Robert Muir

Actually, i took a look at the code again, the queries you mentioned:
I send queries to that field in the form (*term1*term2*)

I think the patch will not fix your problem... The only way i know you
can fix this would be to upgrade to lucene/solr trunk, where wildcard
comparison is linear to the length of the string.

In all other versions, it has much worse runtime, and thats what you
are experiencing.

Separately, even better than this would be to see if you can index
your content in a way to avoid these expensive queries. But this is
just a suggestion, what you are doing should still work fine.

On Fri, Dec 3, 2010 at 6:56 AM, Robert Muir rcm...@gmail.com wrote:
 On Fri, Dec 3, 2010 at 6:28 AM, Tanguy Moal tanguy.m...@gmail.com wrote:
 However suddenly CPU usage simply doubles, and sometimes eventually
 start using all 16 cores of the server, whereas the number of handled
 request is pretty stable, and even starts decreasing because of
 degraded user experience due to dramatic response times.


 Hi Tanguy: This was fixed here:
 https://issues.apache.org/jira/browse/LUCENE-2620.

 You can apply the patch file there
 (https://issues.apache.org/jira/secure/attachment/12452947/LUCENE-2620_3x.patch)
 and recompile your own lucene 2.9.x, or you can replace the lucene jar
 file in your solr war with the newly released lucene-2.9.4 core jar...
 which I think is due to be released later today!

 Thanks for spending the time to report the problem... let us know the
 patch/lucene 2.9.4 doesnt fix it!

Re: [Wildcard query] Weird behaviour

2010-12-03 Thread Tanguy Moal

Thank you very much Robert for replying that fast and accurately.

I have effectively an other idea in mind to provide similar
suggestions less expansively, I was balancing between the work around
and the report issue options.

I don't regret it since you came with a possible fix. I'll give it a
try as soon as possible, and let the list know.

Regards,

Tanguy

2010/12/3 Robert Muir rcm...@gmail.com:
 Actually, i took a look at the code again, the queries you mentioned:
 I send queries to that field in the form (*term1*term2*)

 I think the patch will not fix your problem... The only way i know you
 can fix this would be to upgrade to lucene/solr trunk, where wildcard
 comparison is linear to the length of the string.

 In all other versions, it has much worse runtime, and thats what you
 are experiencing.

 Separately, even better than this would be to see if you can index
 your content in a way to avoid these expensive queries. But this is
 just a suggestion, what you are doing should still work fine.

 On Fri, Dec 3, 2010 at 6:56 AM, Robert Muir rcm...@gmail.com wrote:
 On Fri, Dec 3, 2010 at 6:28 AM, Tanguy Moal tanguy.m...@gmail.com wrote:
 However suddenly CPU usage simply doubles, and sometimes eventually
 start using all 16 cores of the server, whereas the number of handled
 request is pretty stable, and even starts decreasing because of
 degraded user experience due to dramatic response times.


 Hi Tanguy: This was fixed here:
 https://issues.apache.org/jira/browse/LUCENE-2620.

 You can apply the patch file there
 (https://issues.apache.org/jira/secure/attachment/12452947/LUCENE-2620_3x.patch)
 and recompile your own lucene 2.9.x, or you can replace the lucene jar
 file in your solr war with the newly released lucene-2.9.4 core jar...
 which I think is due to be released later today!

 Thanks for spending the time to report the problem... let us know the
 patch/lucene 2.9.4 doesnt fix it!

Re: Solr Multi-thread Update Transaction Control

2010-12-03 Thread Erick Erickson

From Solr's perspective, the fact that multiple threads are
sending data to be indexed is invisible, Solr is just
reading http requests. So I don't think what you're asking
for is possible.

Could you outline the reason you want to do this? Perhaps
there's another way to accomplish it.

Best
Erick

2010/12/2 wangjb wang-ji...@kdc.benic.co.jp

 Hi,
  Now we are using solr1.4.1, and encounter a problem.
  When multi-threads update solr data at the same time, can every thread
 have its separate transaction?
  If this is possible, how can we realize this.
  Is there any suggestion here?
  Waiting online.
  Thank you for any useful reply.

Re: Joining Fields in and Index

2010-12-03 Thread Jan Høydahl / Cominvent

Hi,

I made a MappingUpdateRequestHandler which lets you map country codes to full 
country names with a config file. See 
https://issues.apache.org/jira/browse/SOLR-2151

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 3. des. 2010, at 00.03, Adam Estrada wrote:

 Hi,
 
 I was hoping to do it directly in the index but it was more out of curiosity 
 than anything. I can certainly map it in the DAO but again...I was hoping to 
 learn if it was possible in the index.
 
 Thanks for the feedback!
 
 Adam
 
 On Dec 2, 2010, at 5:48 PM, Savvas-Andreas Moysidis wrote:
 
 Hi,
 
 If you are able to do a full re-index then you could index the full names
 and not the codes. When you later facet on the Country field you'll get the
 actual name rather than the code.
 If you are not able to re-index then probably this conversion could be added
 at your application layer prior to displaying your results.(e.g. in your DAO
 object)
 
 On 2 December 2010 22:05, Adam Estrada estrada.adam.gro...@gmail.comwrote:
 
 All,
 
 I have an index that has a field with country codes in it. I have 7 million
 or so documents in the index and when displaying facets the country codes
 don't mean a whole lot to me. Is there any way to add a field with the full
 country names then join the codes in there accordingly? I suppose I can do
 this before updating the records in the index but before I do that I would
 like to know if there is a way to do this sort of join.
 
 Example: US - United States
 
 Thanks,
 Adam

Facet same field with different preifx

2010-12-03 Thread Eric Grobler

Hi Everyone,

Can I facet the same field twice with a different prefix as per example
below?

facet.field=myfield
f.myfield.facet.prefix=*make*
f.myfield.facet.sort=count

facet.field=myfield
f.myfield.facet.prefix=*model*
f.myfield.facet.sort=count


Thanks and Regards
Ericz

Re: [Wildcard query] Weird behaviour

2010-12-03 Thread Robert Muir

On Fri, Dec 3, 2010 at 7:49 AM, Tanguy Moal tanguy.m...@gmail.com wrote:
 Thank you very much Robert for replying that fast and accurately.

 I have effectively an other idea in mind to provide similar
 suggestions less expansively, I was balancing between the work around
 and the report issue options.

 I don't regret it since you came with a possible fix. I'll give it a
 try as soon as possible, and let the list know.

I'm afraid the patch is only a hack for the case where you have more
than 1 * sequentially (e.g. foobar).
It doesn't fix the more general problem, which is that WildcardQuery
itself uses an inefficient algorithm: this more general problem is
only fixed in lucene/solr trunk.

If you really need these queries i definitely suggest at least trying
trunk, because you should get much better performance.

But it sounds like you might already have an idea to avoid using these
queries so this is of course the best.

Re: Batch Update Fields

2010-12-03 Thread Erick Erickson

No, there's no equivalent to SQL update for all values in a column. You'll
have to reindex all the documents.

On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com
 wrote:

 OK part 2 of my previous question...

 Is there a way to batch update field values based on a certain criteria?
 For example, if thousands of documents have a field value of 'US' can I
 update all of them to 'United States' programmatically?

 Adam

Problem with dismax mm

2010-12-03 Thread Em


Hi list,

I got a little problem with my mm definition:

2-1 450% 566%

Here is what it *should* mean:

If there are 2 clauses, at least one has to match.
If there are more than 2 clauses, at least 50% should match (both rules seem
to mean the same, don't they?). 
And if there are 5 or more than 5 claues, at least 66% should match.

In case of 5 clauses, 3 should match, in case of 6 at least 4 should match
and so on.

However in some test-case I get only the intended behaviour with a
2-clause-query when I say mm=1.
If I got longer queries this would lead to very bad search-quality-results.

What is wrong with this mm-definition?

Thanks for suggestions.
- Em
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-dismax-mm-tp2011496p2011496.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with dismax mm

2010-12-03 Thread Shawn Heisey


On 12/3/2010 6:18 AM, Em wrote:

I got a little problem with my mm definition:

2-1 450% 566%


Are you defining this in a request handler in solrconfig.xml?  If you 
have it entered just like that, I think it may not be understanding it.  
You need to encode the  character.  Here's an excerpt from my dismax 
handler:


str name=mm2lt;-1 4lt;-50%/str

If that's not the problem, then I am not sure what it is, and the 
experts will need more information - version, query URL, configs.


Shawn

Re: Problem with dismax mm

2010-12-03 Thread Erick Erickson

from:
http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
If there are less than 3 optional clauses, they all must match; for 3 to 5
clauses, one less than the number of clauses must match, for 6 or more
clauses, 80% must match, rounded down: 2-1 580%

Personally, the mm parameter makes my head hurt.
As I read it, there are actually 4 buckets that rules apply to, not three
in your mm definition, see below.


Your mm param says, I think, that

clauses  number  rule
 required
1   1 We haven't gotten to a rule yet, this is the
default
2   2 We haven't gotten to a rule yet, this is the
default

3   2 2-1
4   3 2-1

5   2 450% rounded down

6   3 566% (6 * 0.66 = 3.96)
7   4 566% rounded down

Personally, I think the percentages are mind warping and lead to
interesting behavior. I prefer to explicitly list the number of causes
required or relatively constant numbers of required clauses, something
like between 3 and 5, one less. 6 to 9 two less etc. you don't get
weird steps like between 4 and 5 above. Plus, by the time you get to,
say, 7 clauses nobody can keep track of what correct behavior is anyway G.

So I think you're off by one position when applying your rules. Or the Wiki
page is misleading. Or the Wiki page is exactly correct and I'm mis-reading
it.
Like I said, mm makes my head hurt.

Best
Erick


On Fri, Dec 3, 2010 at 8:18 AM, Em mailformailingli...@yahoo.de wrote:


 Hi list,

 I got a little problem with my mm definition:

 2-1 450% 566%

 Here is what it *should* mean:

 If there are 2 clauses, at least one has to match.
 If there are more than 2 clauses, at least 50% should match (both rules
 seem
 to mean the same, don't they?).
 And if there are 5 or more than 5 claues, at least 66% should match.

 In case of 5 clauses, 3 should match, in case of 6 at least 4 should match
 and so on.

 However in some test-case I get only the intended behaviour with a
 2-clause-query when I say mm=1.
 If I got longer queries this would lead to very bad search-quality-results.

 What is wrong with this mm-definition?

 Thanks for suggestions.
 - Em
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Problem-with-dismax-mm-tp2011496p2011496.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with dismax mm

2010-12-03 Thread Em


Thank you both!

Erick,

what you said was absolutely correct.
I missunderstood the definition completely.

Now it works as intended.

Thank you!

Kind regards


Erick Erickson wrote:
 
 from:
 http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
 If there are less than 3 optional clauses, they all must match; for 3 to
 5
 clauses, one less than the number of clauses must match, for 6 or more
 clauses, 80% must match, rounded down: 2-1 580%
 
 Personally, the mm parameter makes my head hurt.
 As I read it, there are actually 4 buckets that rules apply to, not three
 in your mm definition, see below.
 
 
 Your mm param says, I think, that
 
 clauses  number  rule
  required
 1   1 We haven't gotten to a rule yet, this is the
 default
 2   2 We haven't gotten to a rule yet, this is the
 default
 
 3   2 2-1
 4   3 2-1
 
 5   2 450% rounded down
 
 6   3 566% (6 * 0.66 = 3.96)
 7   4 566% rounded down
 
 Personally, I think the percentages are mind warping and lead to
 interesting behavior. I prefer to explicitly list the number of causes
 required or relatively constant numbers of required clauses, something
 like between 3 and 5, one less. 6 to 9 two less etc. you don't get
 weird steps like between 4 and 5 above. Plus, by the time you get to,
 say, 7 clauses nobody can keep track of what correct behavior is anyway
 G.
 
 So I think you're off by one position when applying your rules. Or the
 Wiki
 page is misleading. Or the Wiki page is exactly correct and I'm
 mis-reading
 it.
 Like I said, mm makes my head hurt.
 
 Best
 Erick
 
 
 On Fri, Dec 3, 2010 at 8:18 AM, Em mailformailingli...@yahoo.de wrote:
 

 Hi list,

 I got a little problem with my mm definition:

 2-1 450% 566%

 Here is what it *should* mean:

 If there are 2 clauses, at least one has to match.
 If there are more than 2 clauses, at least 50% should match (both rules
 seem
 to mean the same, don't they?).
 And if there are 5 or more than 5 claues, at least 66% should match.

 In case of 5 clauses, 3 should match, in case of 6 at least 4 should
 match
 and so on.

 However in some test-case I get only the intended behaviour with a
 2-clause-query when I say mm=1.
 If I got longer queries this would lead to very bad
 search-quality-results.

 What is wrong with this mm-definition?

 Thanks for suggestions.
 - Em
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Problem-with-dismax-mm-tp2011496p2011496.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-dismax-mm-tp2011496p2012079.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Limit number of characters returned

2010-12-03 Thread Mark

Correct me if I am wrong but I would like to return highlighted excerpts 
from the document so I would still need to index and store the whole 
document right (ie.. highlighting only works on stored fields)?


On 12/3/10 3:51 AM, Ahmet Arslan wrote:


--- On Fri, 12/3/10, Markstatic.void@gmail.com  wrote:


From: Markstatic.void@gmail.com
Subject: Limit number of characters returned
To: solr-user@lucene.apache.org
Date: Friday, December 3, 2010, 5:39 AM
Is there way to limit the number of
characters returned from a stored field?

For example:

Say I have a document (~2K words) and I search for a word
that's somewhere in the middle. I would like the document to
match the search query but the stored field should only
return the first 200 characters of the document. Is there
anyway to accomplish this that doesn't involve two fields?

I don't think it is possible out-of-the-box. May be you can hack highlighter to 
return that first 200 characters in highlighting response.
Or a custom response writer can do that.

But if you will be always returning first 200 characters of documents, I think creating additional field with 
indexed=false stored=true will be more efficient. And you can make your original field 
indexed=true stored=false, your index size will be diminished.

copyField source=text dest=textShort maxChars=200/

finding exact case insensitive matches on single and multiword values

2010-12-03 Thread PeterKerk



Users call this URL on my site:
/?search=1city=den+haag
or even /?search=1city=Den+Haag (casing of ctyname can be anything)


Under water I call Solr:
http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:den+haagq=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city


but this returns 0 results, even though I KNOW there are exactly 54 records
that have an exact match on den haag (in this case even with lower casing
in DB).

citynames are stored with various casings in DB, so when searching with
solr, the search must ignore casing.


my schema.xml

fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true /
field name=city type=string indexed=true stored=true/


To check what was going on, I opened my analysis.jsp, 

for field name I provide: city
for Field value (Index)  I provide: den haag
When I analyze this I get:
den haag

So that seems correct to me. Why is it that no results are returned?

My requirements summarized:
- I want to search independant of case on cityname:
when user searches on DEn HaAG he will get the records that have value
Den Haag, but also records that have den haag etc.
- citynames may consists of multiple words but only an exact match is valid,
so when user searches for den, he will not find den haag records. And
when searched on den haag it will only return match on that and not other
cities like den bosch.

How can I achieve this?

I think I need a new fieldtype  in my schema.xml, but am not sure which
tokenizers and analyzers I need, here's what I tried:

fieldType name=exactmatch class=solr.TextField
positionIncrementGap=100 
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=false/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt /
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


Help is really appreciated!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012207.html
Sent from the Solr - User mailing list archive at Nabble.com.

Negative fl param

2010-12-03 Thread Mark

When returning results is there a way I can say to return all fields 
except a certain one?


So say I have stored fields foo, bar and baz but I only want to return 
foo and bar. Is it possible to do this without specifically listing out 
the fields I do want?

Re: Limit number of characters returned

2010-12-03 Thread Erick Erickson

Yep, you're correct. CopyField is probably your simplest option here as
Ahmet suggested.
A more complex solution would be your own response writer, but unless and
until you
index gets cumbersome, I'd avoid that. Plus, storing the copied contents
only shouldn't
impact search much, since this doesn't add any terms...

Best
Erick

On Fri, Dec 3, 2010 at 10:32 AM, Mark static.void@gmail.com wrote:

 Correct me if I am wrong but I would like to return highlighted excerpts
 from the document so I would still need to index and store the whole
 document right (ie.. highlighting only works on stored fields)?


 On 12/3/10 3:51 AM, Ahmet Arslan wrote:


 --- On Fri, 12/3/10, Markstatic.void@gmail.com  wrote:

  From: Markstatic.void@gmail.com
 Subject: Limit number of characters returned
 To: solr-user@lucene.apache.org
 Date: Friday, December 3, 2010, 5:39 AM
 Is there way to limit the number of
 characters returned from a stored field?

 For example:

 Say I have a document (~2K words) and I search for a word
 that's somewhere in the middle. I would like the document to
 match the search query but the stored field should only
 return the first 200 characters of the document. Is there
 anyway to accomplish this that doesn't involve two fields?

 I don't think it is possible out-of-the-box. May be you can hack
 highlighter to return that first 200 characters in highlighting response.
 Or a custom response writer can do that.

 But if you will be always returning first 200 characters of documents, I
 think creating additional field with indexed=false stored=true will be
 more efficient. And you can make your original field indexed=true
 stored=false, your index size will be diminished.

 copyField source=text dest=textShort maxChars=200/

Re: Limit number of characters returned

2010-12-03 Thread Mark


Thanks for the response.

Couldn't I just use the highlighter and configure it to use the 
alternative field to return the first 200 characters?  In cases where 
there is a highlighter match I would prefer to show the excerpts anyway.


http://wiki.apache.org/solr/HighlightingParameters#hl.alternateField
http://wiki.apache.org/solr/HighlightingParameters#hl.maxAlternateFieldLength

Is this something wrong with this method?

On 12/3/10 8:03 AM, Erick Erickson wrote:

Yep, you're correct. CopyField is probably your simplest option here as
Ahmet suggested.
A more complex solution would be your own response writer, but unless and
until you
index gets cumbersome, I'd avoid that. Plus, storing the copied contents
only shouldn't
impact search much, since this doesn't add any terms...

Best
Erick

On Fri, Dec 3, 2010 at 10:32 AM, Markstatic.void@gmail.com  wrote:


Correct me if I am wrong but I would like to return highlighted excerpts
from the document so I would still need to index and store the whole
document right (ie.. highlighting only works on stored fields)?


On 12/3/10 3:51 AM, Ahmet Arslan wrote:


--- On Fri, 12/3/10, Markstatic.void@gmail.com   wrote:

  From: Markstatic.void@gmail.com

Subject: Limit number of characters returned
To: solr-user@lucene.apache.org
Date: Friday, December 3, 2010, 5:39 AM
Is there way to limit the number of
characters returned from a stored field?

For example:

Say I have a document (~2K words) and I search for a word
that's somewhere in the middle. I would like the document to
match the search query but the stored field should only
return the first 200 characters of the document. Is there
anyway to accomplish this that doesn't involve two fields?


I don't think it is possible out-of-the-box. May be you can hack
highlighter to return that first 200 characters in highlighting response.
Or a custom response writer can do that.

But if you will be always returning first 200 characters of documents, I
think creating additional field with indexed=false stored=true will be
more efficient. And you can make your original field indexed=true
stored=false, your index size will be diminished.

copyField source=text dest=textShort maxChars=200/

Re: finding exact case insensitive matches on single and multiword values

2010-12-03 Thread Erick Erickson

The root of your problem, I think, is fq=city:den+haag which parses into
city:den +defaultfield:haag

Try parens, i.e. city:(den haag).

Attaching debugQuery=on is often a way to see thing like this quickly

Also, if you haven't seen the analysis page from the admin page, it's really
valuable
for figuring out the effects of analyzers. You can probably do something
like:

fieldType name=myField class=solr.TextField sortMissingLast=true
omitNorms=true
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

to get what you want.

Best
Erick

On Fri, Dec 3, 2010 at 10:46 AM, PeterKerk vettepa...@hotmail.com wrote:



 Users call this URL on my site:
 /?search=1city=den+haag
 or even /?search=1city=Den+Haag (casing of ctyname can be anything)


 Under water I call Solr:

 http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:den+haagq=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city


 but this returns 0 results, even though I KNOW there are exactly 54 records
 that have an exact match on den haag (in this case even with lower casing
 in DB).

 citynames are stored with various casings in DB, so when searching with
 solr, the search must ignore casing.


 my schema.xml

 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true /
 field name=city type=string indexed=true stored=true/


 To check what was going on, I opened my analysis.jsp,

 for field name I provide: city
 for Field value (Index)  I provide: den haag
 When I analyze this I get:
 den haag

 So that seems correct to me. Why is it that no results are returned?

 My requirements summarized:
 - I want to search independant of case on cityname:
when user searches on DEn HaAG he will get the records that have
 value
 Den Haag, but also records that have den haag etc.
 - citynames may consists of multiple words but only an exact match is
 valid,
 so when user searches for den, he will not find den haag records. And
 when searched on den haag it will only return match on that and not other
 cities like den bosch.

 How can I achieve this?

 I think I need a new fieldtype  in my schema.xml, but am not sure which
 tokenizers and analyzers I need, here's what I tried:

 fieldType name=exactmatch class=solr.TextField
 positionIncrementGap=100 
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=false/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_dutch.txt /
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
 /fieldType


 Help is really appreciated!
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012207.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: finding exact case insensitive matches on single and multiword values

2010-12-03 Thread PeterKerk



You are right, this is what I see when I append the debug query (very very
useful btw!!!) in old situation:
arr name=parsed_filter_queries
strcity:den title:haag/str
strPhraseQuery(themes:hotel en restaur)/str
/arr



I then changed the schema.xml to:

fieldType name=myField class=solr.TextField sortMissingLast=true
omitNorms=true 
analyzer 
tokenizer class=solr.KeywordTokenizerFactory/ 
filter class=solr.LowerCaseFilterFactory/ 
/analyzer 
/fieldType 

field name=city type=myField indexed=true stored=true/ !-- used
to be string --


I then tried adding parentheses:
http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:(den+haag)q=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city
also tried (without +):
http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:(den
haag)q=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city

Then I get:

arr name=parsed_filter_queries
strcity:den city:haag/str
/arr

And still 0 results

But as you can see the query is split up into 2 separate words, I dont think
that is what I need?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012509.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spellchecker results not as desired

2010-12-03 Thread abhayd


Thanks,
I was able to fix this issue with combination of EdgeNGrams and fuzzy query.

here are details 
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

I just added fuzzyquery operator and seems to be working so far
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/spellchecker-results-not-as-desired-tp1789192p2012887.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: finding exact case insensitive matches on single and multiword values

2010-12-03 Thread Geert-Jan Brits

when you went from strField to TextField in your config you enabled
tokenizing (which I believe splits on spaces by default),
which is why you see seperate 'words' / terms in the debugQuery-explanation.

I believe you want to keep your old strField config and try quoting:

fq=city:den+haag or fq=city:den haag

Concerning the lower-casing: wouldn't if be easiest to do that at the
client? (I'm not sure at the moment how to do lowercasing with a strField)
.

Geert-jan


2010/12/3 PeterKerk vettepa...@hotmail.com



 You are right, this is what I see when I append the debug query (very very
 useful btw!!!) in old situation:
 arr name=parsed_filter_queries
strcity:den title:haag/str
strPhraseQuery(themes:hotel en restaur)/str
 /arr



 I then changed the schema.xml to:

 fieldType name=myField class=solr.TextField sortMissingLast=true
 omitNorms=true
 analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType

 field name=city type=myField indexed=true stored=true/ !-- used
 to be string --


 I then tried adding parentheses:

 http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:(den+haag)q=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city
 also tried (without +):
 http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:(den
 haag)q=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city

 Then I get:

 arr name=parsed_filter_queries
strcity:den city:haag/str
 /arr

 And still 0 results

 But as you can see the query is split up into 2 separate words, I dont
 think
 that is what I need?


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012509.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 1.4 suggester component

2010-12-03 Thread abhayd


thanks ..

i used
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

with fuzzy operator..
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-1-4-suggester-component-tp1766915p2012946.html
Sent from the Solr - User mailing list archive at Nabble.com.

boosting certain docs based on a filed value

2010-12-03 Thread abhayd


hi 

I was looking to boost certain docs based on some values in a indexed field.

e.g.
pType
-
post paid
go phone

Would like to have post paid docs first and then go phone.
I checked the functional query but could not figure out.

Any help?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/boosting-certain-docs-based-on-a-filed-value-tp2012962p2012962.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Limit number of characters returned

2010-12-03 Thread Ahmet Arslan

 Couldn't I just use the highlighter and configure it to use
 the 
 alternative field to return the first 200 characters? 
 In cases where 
 there is a highlighter match I would prefer to show the
 excerpts anyway.
 
 http://wiki.apache.org/solr/HighlightingParameters#hl.alternateField
 http://wiki.apache.org/solr/HighlightingParameters#hl.maxAlternateFieldLength
 
 Is this something wrong with this method?

No, you can do that. It is perfectly fine.

Re: Batch Update Fields

2010-12-03 Thread Adam Estrada

I wonder...I know that sed would work to find and replace the terms in all
of the csv files that I am indexing but would it work to find and replace
key terms in the index?

find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \;

That command would iterate through all the files in the data directory and
replace the country code with the full country name. I many just back up the
directory and try it. I have it running on csv files right now and it's
working wonderfully. For those of you interested, I am indexing the entire
Geonames dataset http://download.geonames.org/export/dump/ (allCountries.zip)
which gives me a pretty comprehensive world gazetteer. My next step is gonna
be to display the results as KML to view over a google globe.

Thoughts?

Adam

On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.comwrote:

 No, there's no equivalent to SQL update for all values in a column. You'll
 have to reindex all the documents.

 On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
 estrada.adam.gro...@gmail.com
  wrote:

  OK part 2 of my previous question...
 
  Is there a way to batch update field values based on a certain criteria?
  For example, if thousands of documents have a field value of 'US' can I
  update all of them to 'United States' programmatically?
 
  Adam

nexus of synonyms and stemming, take 2

2010-12-03 Thread Will Milspec

hi all,

[This is a second attempt at emailing. The apache mailing list spam filter
apparently did not like my synonyms entry, ie.. classified my email as spam.
I have replaced phone with 'foo' , 'cell' with 'sell' and 'mobile' with
'nubile' ]

This is a fairly basic synonyms question: how does synonyms handle stemming?


Example: Synonyms.txt has entry:
  sell,sell foo,nubile,nubile foo,wireless foo

If I want to match on 'sell foos'...

a) do I need to add an entry for 'sell foos' (i.e. in addition to sell foo)
b) or will the stemmer (porter/snowball) handle this already


thanks

will

Re: Batch Update Fields

2010-12-03 Thread Markus Jelsma



On Friday 03 December 2010 18:20:44 Adam Estrada wrote:
 I wonder...I know that sed would work to find and replace the terms in all
 of the csv files that I am indexing but would it work to find and replace
 key terms in the index?

It'll most likely corrupt your index. Offsets, positions etc won't have the 
proper meaning anymore.

 find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \;
 
 That command would iterate through all the files in the data directory and
 replace the country code with the full country name. I many just back up
 the directory and try it. I have it running on csv files right now and
 it's working wonderfully. For those of you interested, I am indexing the
 entire Geonames dataset http://download.geonames.org/export/dump/
 (allCountries.zip) which gives me a pretty comprehensive world gazetteer.
 My next step is gonna be to display the results as KML to view over a
 google globe.
 
 Thoughts?
 
 Adam
 
 On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson 
erickerick...@gmail.comwrote:
  No, there's no equivalent to SQL update for all values in a column.
  You'll have to reindex all the documents.
  
  On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
  estrada.adam.gro...@gmail.com
  
   wrote:
   
   OK part 2 of my previous question...
   
   Is there a way to batch update field values based on a certain
   criteria? For example, if thousands of documents have a field value of
   'US' can I update all of them to 'United States' programmatically?
   
   Adam

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Negative fl param

2010-12-03 Thread Ahmet Arslan

 When returning results is there a way
 I can say to return all fields except a certain one?
 
 So say I have stored fields foo, bar and baz but I only
 want to return foo and bar. Is it possible to do this
 without specifically listing out the fields I do want?


There were a similar discussion. http://search-lucene.com/m/2qJaU1wImo3/

A workaround can be getting all stored field names from 
http://wiki.apache.org/solr/LukeRequestHandler and construct fl accordingly.

Re: finding exact case insensitive matches on single and multiword values

2010-12-03 Thread Erick Erickson

Arrrgh, Geert-Jan is right, that't the 15th time at least this has tripped
me up.

I'm pretty sure that text will work if you escape the space, e.g.
city:(den\ haag). The debug output is a little confusing since it has a line
like
city:den haag

which almost looks wrong... but it worked
out OK on a couple of queries I tried.

Geert-Jan is also right in that filters aren't applied to string types
so there's two possibilities, either handle the casing on the client
side as he suggests and use string or make the text type work.


Sorry for the confusion
Erick

On Fri, Dec 3, 2010 at 11:54 AM, Geert-Jan Brits gbr...@gmail.com wrote:

 when you went from strField to TextField in your config you enabled
 tokenizing (which I believe splits on spaces by default),
 which is why you see seperate 'words' / terms in the
 debugQuery-explanation.

 I believe you want to keep your old strField config and try quoting:

 fq=city:den+haag or fq=city:den haag

 Concerning the lower-casing: wouldn't if be easiest to do that at the
 client? (I'm not sure at the moment how to do lowercasing with a strField)
 .

 Geert-jan


 2010/12/3 PeterKerk vettepa...@hotmail.com

 
 
  You are right, this is what I see when I append the debug query (very
 very
  useful btw!!!) in old situation:
  arr name=parsed_filter_queries
 strcity:den title:haag/str
 strPhraseQuery(themes:hotel en restaur)/str
  /arr
 
 
 
  I then changed the schema.xml to:
 
  fieldType name=myField class=solr.TextField sortMissingLast=true
  omitNorms=true
  analyzer
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
  /analyzer
  /fieldType
 
  field name=city type=myField indexed=true stored=true/ !--
 used
  to be string --
 
 
  I then tried adding parentheses:
 
 
 http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:(den+haag)q=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city
  also tried (without +):
  http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:(den
  haag)q=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city
 
  Then I get:
 
  arr name=parsed_filter_queries
 strcity:den city:haag/str
  /arr
 
  And still 0 results
 
  But as you can see the query is split up into 2 separate words, I dont
  think
  that is what I need?
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012509.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: score from two cores

2010-12-03 Thread Erick Erickson

Uhhm, what are you trying to do? What do you want to do with the scores from
two cores?

Best
Erick

On Fri, Dec 3, 2010 at 11:21 AM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 I have multiple cores. How can I deal with score?

 Thanks so much for help!
 Xiaohui

Re: boosting certain docs based on a filed value

2010-12-03 Thread Ahmet Arslan

 I was looking to boost certain docs based on some values in
 a indexed field.
 
 e.g.
 pType
 -
 post paid
 go phone
 
 Would like to have post paid docs first and then go phone.
 I checked the functional query but could not figure out.

You can use 
http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29 if you are 
using dismax. 

bq=pType:post paid^100

If you are using default query parser then you can append this optional clause 
to your query q = some other query pType:post paid^100

Re: Batch Update Fields

2010-12-03 Thread Erick Erickson

Have you consider defining synonyms for your code -country
conversion at index time (or query time for that matter)?

We may have an XY problem here. Could you state the high-level
problem you're trying to solve? Maybe there's a better solution...

Best
Erick

On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada estrada.adam.gro...@gmail.com
 wrote:

 I wonder...I know that sed would work to find and replace the terms in all
 of the csv files that I am indexing but would it work to find and replace
 key terms in the index?

 find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \;

 That command would iterate through all the files in the data directory and
 replace the country code with the full country name. I many just back up
 the
 directory and try it. I have it running on csv files right now and it's
 working wonderfully. For those of you interested, I am indexing the entire
 Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip)
 which gives me a pretty comprehensive world gazetteer. My next step is
 gonna
 be to display the results as KML to view over a google globe.

 Thoughts?

 Adam

 On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  No, there's no equivalent to SQL update for all values in a column.
 You'll
  have to reindex all the documents.
 
  On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
  estrada.adam.gro...@gmail.com
   wrote:
 
   OK part 2 of my previous question...
  
   Is there a way to batch update field values based on a certain
 criteria?
   For example, if thousands of documents have a field value of 'US' can I
   update all of them to 'United States' programmatically?
  
   Adam

can solrj swap cores?

2010-12-03 Thread Will Milspec

hi all,

Does solrj support swapping cores?

One of our developers had initially tried swapping solr cores (e.g. core0
and core1) using the solrj api, but it failed. (don't have the exact error)
He susequently replaced the call with straight http (i.e. http client).

Unfortunately I don't have the exact error in front of me...

Solrj code:

   CoreAdminRequest car = new CoreAdminRequest();
   car.setCoreName(production);
   car.setOtherCoreName(reindex);
   car.setAction(CoreAdminParams.CoreAdminAction.SWAP);

  SolrServer solrServer = SolrUtil.getSolrServer();
  car.process(solrServer);
  solrServer.commit();

Finally, can someone comment on the solrj javadoc on CoreAdminRequest:
 * This class is experimental and subject to change.

thanks,

will

Re: Batch Update Fields

2010-12-03 Thread Adam Estrada

First off...I know enough about Solr to be VERY dangerous so please bare
with me ;-) I am indexing the geonames database which only provides country
codes. I can facet the codes but to the end user who may not know all 249
codes, it isn't really all that helpful. Therefore, I want to map the full
country names to the country codes provided in the geonames db.
http://download.geonames.org/export/dump/

http://download.geonames.org/export/dump/I used a simple split function to
chop the 850 meg txt file in to manageable csv's that I can import in to
Solr. Now that all 7 million + documents are in there, I want to change the
country codes to the actual country names. I would of liked to have done it
in the index but finding and replacing the strings in the csv seems to be
working fine. After that I can just reindex the entire thing.

Adam

On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.comwrote:

 Have you consider defining synonyms for your code -country
 conversion at index time (or query time for that matter)?

 We may have an XY problem here. Could you state the high-level
 problem you're trying to solve? Maybe there's a better solution...

 Best
 Erick

 On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada 
 estrada.adam.gro...@gmail.com
  wrote:

  I wonder...I know that sed would work to find and replace the terms in
 all
  of the csv files that I am indexing but would it work to find and replace
  key terms in the index?
 
  find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \;
 
  That command would iterate through all the files in the data directory
 and
  replace the country code with the full country name. I many just back up
  the
  directory and try it. I have it running on csv files right now and it's
  working wonderfully. For those of you interested, I am indexing the
 entire
  Geonames dataset
 http://download.geonames.org/export/dump/(allCountries.zip)
  which gives me a pretty comprehensive world gazetteer. My next step is
  gonna
  be to display the results as KML to view over a google globe.
 
  Thoughts?
 
  Adam
 
  On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   No, there's no equivalent to SQL update for all values in a column.
  You'll
   have to reindex all the documents.
  
   On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
   estrada.adam.gro...@gmail.com
wrote:
  
OK part 2 of my previous question...
   
Is there a way to batch update field values based on a certain
  criteria?
For example, if thousands of documents have a field value of 'US' can
 I
update all of them to 'United States' programmatically?
   
Adam

dataimports response returns before done?

2010-12-03 Thread Tri Nguyen

Hi,
 
After issueing a dataimport, I've noticed solr returns a response prior to 
finishing the import. Is this correct?   Is there anyway i can make solr not 
return until it finishes?
 
If not, how do I ping for the status whether it finished or not?
 
thanks,
 
tri

Question about Solr Fieldtypes, Chaining of Tokenizers

2010-12-03 Thread Matthew Hall

Hey folks, I'm working with a fairly specific set of requirements for 
our corpus that needs a somewhat tricky text type for both indexing and 
searching.


The chain currently looks like this:

tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.PatternReplaceFilterFactory
   pattern=(.*?)(\p{Punct}*)$
   replacement=$1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/

filter class=solr.PatternReplaceFilterFactory
   pattern=\p{Punct}
   replacement= /
tokenizer class=solr.WhitespaceTokenizerFactory/

Now you will notice that I'm trying to add in a second tokenizer to this 
chain at the very end, this is due to the final replacement of 
punctuation to whitespace.  At that point I'd like to further break up 
these tokens to smaller tokens.


The reason for this is that we have a mixed normal english word and 
scientific corpus.  For example you could expect string like The 
symposium of TgThe(RX3fg+and) gene studies being added to the index, 
and parts of those phrases being searched on.


We want to be able to remove the stopwords in the mostly english parts 
of these types of statements, which the whitespace tokenizer, followed 
by removing trailing punctuation,  followed by the stopfilter takes care 
of.  We do not want to remove references to genetic information 
contained in allele symbols and the like.


Sadly as far as I can tell, you cannot chain tokenizers in the 
schema.xml, so does anyone have some suggestions on how this could be 
accomplished?


Oh, and let me add that the WordDelimiterFilter comes really close to 
what I want, but since we are unwilling to promote our solr version to 
the trunk (we are on the 1.4x) version atm, the inability to turn off 
the automatic phrase queries makes it a no go.  We need to be able to 
make searches on left/right match right/left.


My searches through the old material on this subject isn't really 
showing me much except some advice on using the copyField attribute.  
But my understanding is that this will simply take your original input 
to the field, and then analyze it in two different ways depending on the 
field definitions.  It would be very nice if it were copying the already 
analyzed version of the text... but that's not what its doing, right?


Thanks for any advice on this matter.

Matt

Re: dataimports response returns before done?

2010-12-03 Thread Ahmet Arslan

--- On Fri, 12/3/10, Tri Nguyen tringuye...@yahoo.com wrote:

 From: Tri Nguyen tringuye...@yahoo.com
 Subject: dataimports response returns before done?
 To: solr user solr-user@lucene.apache.org
 Date: Friday, December 3, 2010, 7:55 PM
 Hi,

 After issueing a dataimport, I've noticed solr returns a
 response prior to finishing the import. Is this correct?  
 Is there anyway i can make solr not return until it
 finishes?

 If not, how do I ping for the status whether it finished or
 not?

So you want to do something at the end of the import?
http://wiki.apache.org/solr/DataImportHandler#EventListeners may help.

Also you can always poll solr/dataimport url and check status (busy,idle)

RE: score from two cores

2010-12-03 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

Please correct me if I am doing something wrong. I really appreciate your help!

I have a core for metadata (xml files) and a core for pdf documents. Sometimes 
I need search them separately, sometimes I need search both of them together. 
There is the same key which is related them for each item.

For example, the xml files look like following:
?xml version=1.0 encoding=ISO-8859-1?
List
Item  
Keyrmaaac.pdf/Key
TIsomethingTI
UIrmaaac/UI
/Item
Item
   .
/List

I index rmaaac.pdf file with same Key and UI field in another core. Here is the 
example after I index rmaaac.pdf.
  ?xml version=1.0 encoding=UTF-8 ? 
  response
  lst name=responseHeader
  int name=status0/int 
  int name=QTime3/int 
  lst name=params
  str name=indenton/str 
  str name=start0/str 
  str name=qcollectionid: RM/str 
  str name=rows10/str 
  str name=version2.2/str 
  /lst
  /lst
  result name=response numFound=1 start=0
  doc
str name=UIrm/str 
str name=Keyrm.pdf/str  
str name=metadata_contentsomething/str
  /doc
  /result

The result information which is display to user comes from metadata, not from 
pdf files. If I search a term from documents, in order to display search 
results to user, I have to get Keys from documents and then redo search from 
metadata. Then score is different.

Please give me some suggestions!

Thanks so much,
Xiaohui 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, December 03, 2010 12:37 PM
To: solr-user@lucene.apache.org
Subject: Re: score from two cores

Uhhm, what are you trying to do? What do you want to do with the scores from
two cores?

Best
Erick

On Fri, Dec 3, 2010 at 11:21 AM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 I have multiple cores. How can I deal with score?

 Thanks so much for help!
 Xiaohui

Re: Negative fl param

2010-12-03 Thread Mark

Ok simple enough. I just created a SearchComponent that removes values 
from the fl param.


On 12/3/10 9:32 AM, Ahmet Arslan wrote:

When returning results is there a way
I can say to return all fields except a certain one?

So say I have stored fields foo, bar and baz but I only
want to return foo and bar. Is it possible to do this
without specifically listing out the fields I do want?


There were a similar discussion. http://search-lucene.com/m/2qJaU1wImo3/

A workaround can be getting all stored field names from 
http://wiki.apache.org/solr/LukeRequestHandler and construct fl accordingly.

Highlighting parameters

2010-12-03 Thread Mark


Is there a way I can specify separate configuration for 2 different fields.

For field 1 I wan to display only 100 chars, Field 2 200 chars

Syncing 'delta-import' with 'select' query

2010-12-03 Thread Juan Manuel Alvarez

Hello everyone! I would like to ask you a question about DIH.

I am using a database and DIH to sync against Solr, and a GUI to
display and operate on the items retrieved from Solr.
When I change the state of an item through the GUI, the following happens:
a. The item is updated in the DB.
b. A delta-import command is fired to sync the DB with Solr.
c. The GUI is refreshed by making a query to Solr.

My problem comes between (b) and (c). The delta-import operation is
executed in a new thread, so my call returns immediately, refreshing
the GUI before the Solr index is updated causing the item state in the
GUI to be outdated.

I had two ideas so far:
1. Querying the status of the DIH after the delta-import operation and
do not return until it is idle. The problem I see with this is that
if other users execute delta-imports, the status will be busy until
all operations are finished.
2. Use Zoie. The first problem is that configuring it is not as
straightforward as it seems, so I don't want to spend more time trying
it until I am sure that this will solve my issue. On the other hand, I
think that I may suffer the same problem since the delta-import is
still firing in another thread, so I can't be sure it will be called
fast enough.

Am I pointing on the right direction or is there another way to
achieve my goal?

Thanks in advance!
Juan M.

Re: boosting certain docs based on a filed value

2010-12-03 Thread abhayd


thanks!! that worked..

Can i enter the sequence too like postpaid,free,costly?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/boosting-certain-docs-based-on-a-filed-value-tp2012962p2013895.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting parameters

2010-12-03 Thread Markus Jelsma

Yes


Some parameters may be overriden on a per-field basis with the following 
syntax:

  f.fieldName.originalParam=value

http://wiki.apache.org/solr/HighlightingParameters


 Is there a way I can specify separate configuration for 2 different fields.
 
 For field 1 I wan to display only 100 chars, Field 2 200 chars

Re: Highlighting parameters

2010-12-03 Thread Ahmet Arslan

 Is there a way I can specify separate
 configuration for 2 different fields.
 
 For field 1 I wan to display only 100 chars, Field 2 200
 chars
 

yes with the parameter accepts per-field overrides. the syntax is 
http://wiki.apache.org/solr/HighlightingParameters#HowToOverride

f.TEXT.hl.maxAlternateFieldLength=80f.CATEGORY.hl.maxAlternateFieldLength=100

Re: boosting certain docs based on a filed value

2010-12-03 Thread Ahmet Arslan

 
 thanks!! that worked..
 
 Can i enter the sequence too like postpaid,free,costly?
 

Does that mean you want to display first postpaid, after that free, and lastly 
costly? 

If thats you want, i think it is better to create a tint field using these 
types and then sort by this field.

pstpaid=300
free=200
costy=100   sort=newTintField desc, score desc

http://wiki.apache.org/solr/CommonQueryParameters#sort

Re: Batch Update Fields

2010-12-03 Thread Erick Erickson

That will certainly work. Another option, assuming the country codes are
in their own field would be to put the transformations into a synonym file
that was only used on that field. That way you'd get this without having
to do the pre-process step of the raw data...

That said, if you pre-processing is working for you it may  not be worth
your while
to worry about doing it differently

Best
Erick

On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada estrada.adam.gro...@gmail.com
 wrote:

 First off...I know enough about Solr to be VERY dangerous so please bare
 with me ;-) I am indexing the geonames database which only provides country
 codes. I can facet the codes but to the end user who may not know all 249
 codes, it isn't really all that helpful. Therefore, I want to map the full
 country names to the country codes provided in the geonames db.
 http://download.geonames.org/export/dump/

 http://download.geonames.org/export/dump/I used a simple split function
 to
 chop the 850 meg txt file in to manageable csv's that I can import in to
 Solr. Now that all 7 million + documents are in there, I want to change the
 country codes to the actual country names. I would of liked to have done it
 in the index but finding and replacing the strings in the csv seems to be
 working fine. After that I can just reindex the entire thing.

 Adam

 On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Have you consider defining synonyms for your code -country
  conversion at index time (or query time for that matter)?
 
  We may have an XY problem here. Could you state the high-level
  problem you're trying to solve? Maybe there's a better solution...
 
  Best
  Erick
 
  On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada 
  estrada.adam.gro...@gmail.com
   wrote:
 
   I wonder...I know that sed would work to find and replace the terms in
  all
   of the csv files that I am indexing but would it work to find and
 replace
   key terms in the index?
  
   find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {}
 \;
  
   That command would iterate through all the files in the data directory
  and
   replace the country code with the full country name. I many just back
 up
   the
   directory and try it. I have it running on csv files right now and it's
   working wonderfully. For those of you interested, I am indexing the
  entire
   Geonames dataset
  http://download.geonames.org/export/dump/(allCountries.zip)
   which gives me a pretty comprehensive world gazetteer. My next step is
   gonna
   be to display the results as KML to view over a google globe.
  
   Thoughts?
  
   Adam
  
   On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson 
 erickerick...@gmail.com
   wrote:
  
No, there's no equivalent to SQL update for all values in a column.
   You'll
have to reindex all the documents.
   
On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
estrada.adam.gro...@gmail.com
 wrote:
   
 OK part 2 of my previous question...

 Is there a way to batch update field values based on a certain
   criteria?
 For example, if thousands of documents have a field value of 'US'
 can
  I
 update all of them to 'United States' programmatically?

 Adam

Re: score from two cores

2010-12-03 Thread Erick Erickson

The scores will not be comparable. Scores are only relevant within one
search
on one core, so comparing them across two queries (even if it's the same
query
but against two different cores) is meaningless.

So, given your setup I would just use the results from one of the cores and
fill in
data from the other...

But why do you have two cores in the first place? Is it really necessary or
is it just
making things more complex?

Best
Erick

On Fri, Dec 3, 2010 at 1:36 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 Please correct me if I am doing something wrong. I really appreciate your
 help!

 I have a core for metadata (xml files) and a core for pdf documents.
 Sometimes I need search them separately, sometimes I need search both of
 them together. There is the same key which is related them for each item.

 For example, the xml files look like following:
 ?xml version=1.0 encoding=ISO-8859-1?
 List
Item
Keyrmaaac.pdf/Key
TIsomethingTI
UIrmaaac/UI
/Item
Item
   .
 /List

 I index rmaaac.pdf file with same Key and UI field in another core. Here is
 the example after I index rmaaac.pdf.
  ?xml version=1.0 encoding=UTF-8 ?
  response
  lst name=responseHeader
  int name=status0/int
  int name=QTime3/int
  lst name=params
  str name=indenton/str
  str name=start0/str
  str name=qcollectionid: RM/str
  str name=rows10/str
  str name=version2.2/str
  /lst
  /lst
  result name=response numFound=1 start=0
  doc
str name=UIrm/str
str name=Keyrm.pdf/str
str name=metadata_contentsomething/str
  /doc
  /result

 The result information which is display to user comes from metadata, not
 from pdf files. If I search a term from documents, in order to display
 search results to user, I have to get Keys from documents and then redo
 search from metadata. Then score is different.

 Please give me some suggestions!

 Thanks so much,
 Xiaohui

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Friday, December 03, 2010 12:37 PM
 To: solr-user@lucene.apache.org
 Subject: Re: score from two cores

 Uhhm, what are you trying to do? What do you want to do with the scores
 from
 two cores?

 Best
 Erick

 On Fri, Dec 3, 2010 at 11:21 AM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
 xiao...@mail.nlm.nih.gov wrote:

  I have multiple cores. How can I deal with score?
 
  Thanks so much for help!
  Xiaohui

Re: score from two cores

2010-12-03 Thread Paul

On Fri, Dec 3, 2010 at 4:47 PM, Erick Erickson erickerick...@gmail.com wrote:
 But why do you have two cores in the first place? Is it really necessary or
 is it just
 making things more complex?

I don't know why the OP wants two cores, but I ran into this same
problem and had to abandon using a second core. My use case is: I have
lots of slowing-changing documents, and a few often-changing
documents. Those classes of documents are updated by different people
using different processes. I wanted to split them into separate cores
so that:

1) The large core wouldn't change except deliberately so there would
be less chance of a bug creeping in. Also, that core is the same on
different servers, so they could be replicated.

2) The small core would update and optimize quickly and the data in it
is different on different servers.

The problem is that the search results should return relevancy as if
there were only one core.

highlighting wiki confusion

2010-12-03 Thread Lance Norskog

http://wiki.apache.org/solr/HighlightingParameters?#hl.highlightMultiTerm

If the SpanScorer is also being used, enables highlighting for
range/wildcard/fuzzy/prefix queries. Default is false.  Solr1.4. This
parameter makes sense for Highlighter only.

I think this meant 'for PhraseHighlighter only'?


-- 
Lance Norskog
goks...@gmail.com

Re: Restrict access to localhost

2010-12-03 Thread Tom


If you are using another app to create the index, I think you can remove the
update servlet mapping in the web.xml.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Restrict-access-to-localhost-tp2004475p2014129.html
Sent from the Solr - User mailing list archive at Nabble.com.

52 matches

Mail list logo