How do I add a custom field?

2011-07-03 Thread Gabriele Kahlout
Hello,

I want to have an additional  field that appears for every document in
search results. I understand that I should do this by adding the field to
the schema.xml, so I add:

Then I restart Solr (so that I loads the new schema.xml) and make a query
specifying that it should return myField too, but it doesn't. Will it do
only for newly indexed documents? Am I missing something?

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: How do I add a custom field?

2011-07-03 Thread lee carroll
Hi Gabriele,
Did you index any docs with your new field ?

The results will just bring back docs and what fields they have. They won't
bring back "null" fields just because they are in your schema. Lucene
is schema-less.
Solr adds the schema to make it nice to administer and very powerful to use.





On 3 July 2011 11:01, Gabriele Kahlout  wrote:
> Hello,
>
> I want to have an additional  field that appears for every document in
> search results. I understand that I should do this by adding the field to
> the schema.xml, so I add:
>     indexed="false"/>
> Then I restart Solr (so that I loads the new schema.xml) and make a query
> specifying that it should return myField too, but it doesn't. Will it do
> only for newly indexed documents? Am I missing something?
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>


Re: non-alphanumeric character searching

2011-07-03 Thread Erick Erickson
I'd start by removing lots of stuff, particularly
WordDelimiterFilterFactory. That's splitting your input up by non-alpha
characters.

If you really want just the string stored, try just using KeywordTokenizer
and LowerCaseFilter (although AsciFolding... wouln't hurt).

But the best way to understand all the effects of various analysis chains is
to use the admin/analysis page (be sure to turn the verbose output on).
It'll show you exctly what produces what transformations. Another very
useful tool is adding &debugQuery=on to your URLs and looking at the parsed
output.

Oh, and I expect that some of your unexpected results are a result of
searching against our default field. Searching "example:stuff" will search
against the field "example". Searching "stuff" will search against the
 in your schema.xml (note that the &debugQuery=on will show
this)

Hope this helps
Erick
On Jul 1, 2011 11:19 AM, "Lisa Riggle"  wrote:
> Hi Everyone!
>
> I'm very new to solr and have a question that I hope you all can answer.
>
> My boss has me learning solr for work, with the specific goal of
> improving the schema on one of the cores of our site. This core
> consists of nothing but company names from our database, so I think that
> makes things easier, since there's no need to worry about parsing email
> address or URL's or anything.
>
> Anyway, I am running into some problems with non-alphanumeric characters
> in company names causing searches to return wild results. For example,
> there is 1 company in our database stored as /HPC Inter@ctive
> (ApartmentGuide.com)/. In my test script, I have a couple of different
> search strings that don't seem to return consistent results. For
> example:/hpc inter@ctive/ returns 1 result (yay), but /hpc inter@ctive
> (apartment/ and /hpc inter@ctive (apartmentguide/ both return 0
> results. /inter@ctive/ by itself returns 832 results.
>
> Among other issues, I'm having a heck of a time trying to figure out how
> to make solr just search for "inter@ctive" as a whole word instead of
> splitting it up at the @ and searching for "inter" and "ctive".
>
> How do I get solr to ignore special characters, like @, and just treat
> it as part of the string?
>
> I've spent some time trying out diffrerent tokenizers and filters, and
> rearranging the order of some of the filters. Doing that does affect
> the results at times, but mostly I get the results listed above. I also
> tried using the PatternReplacefilterFactory to just remove all special
> characters from the index/search strings, but I'm fantastically bad at
> regex, so that didn't work either.
>
> I appreciate any and all advice.
> Thanks!
> --Lisa
>
> --
>
> I'm running a default install of solr 3.2 with the following schema:
>
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  catenateNumbers="0" catenateAll="1" splitOnCaseChange="1"
preserveOriginal="1"/>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  catenateNumbers="0" catenateAll="1" splitOnCaseChange="1"
preserveOriginal="1"/>
> 
> 
> 
> 
>
>
> I don't know what other information is needed to point me in the right
> direction, but please let me know if there's something I can send that
> will be of assistance.


Re: upgraded from 2.9 to 3.x, problems. help?

2011-07-03 Thread Erick Erickson
Can you post the results of adding &debugQuery=on to your two versions? And
have you re-indexed or not?

Best
Erick
On Jul 1, 2011 12:31 PM, "dhastings"  wrote:
> i guess what im asking is how to set up solr/lucene to find
> yale l.j.
> yale l. j.
> yale l j
> as all the same thing.
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/upgraded-from-2-9-to-3-x-problems-help-tp3129348p3129520.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: How do I add a custom field?

2011-07-03 Thread Gabriele Kahlout
Is there how I can compute and add the field to all indexed documents
without re-indexing? MyField counts the number of terms per document (unique
word count).

On Sun, Jul 3, 2011 at 12:24 PM, lee carroll
wrote:

> Hi Gabriele,
> Did you index any docs with your new field ?
>
> The results will just bring back docs and what fields they have. They won't
> bring back "null" fields just because they are in your schema. Lucene
> is schema-less.
> Solr adds the schema to make it nice to administer and very powerful to
> use.
>
>
>
>
>
> On 3 July 2011 11:01, Gabriele Kahlout  wrote:
> > Hello,
> >
> > I want to have an additional  field that appears for every document in
> > search results. I understand that I should do this by adding the field to
> > the schema.xml, so I add:
> > > indexed="false"/>
> > Then I restart Solr (so that I loads the new schema.xml) and make a query
> > specifying that it should return myField too, but it doesn't. Will it do
> > only for newly indexed documents? Am I missing something?
> >
> > --
> > Regards,
> > K. Gabriele
> >
> > --- unchanged since 20/9/10 ---
> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> > receipt within 48 hours then I don't resend the email.
> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> > < Now + 48h) ⇒ ¬resend(I, this).
> >
> > If an email is sent by a sender that is not a trusted contact or the
> email
> > does not contain a valid code then the email is not received. A valid
> code
> > starts with a hyphen and ends with "X".
> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> > L(-[a-z]+[0-9]X)).
> >
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Exception when using result grouping and sorting by geodist() with Solr 3.3

2011-07-03 Thread Thomas Heigl
Hello,

I just tried up(down?)grading our current Solr 4.0 trunk setup to Solr 3.3.0
as result grouping was the only reason for us to stay with the trunk.
Everything worked like a charm except for one of our queries, where we group
results by the owning user and sort by distance.

A simplified example for my query (that still fails) looks like this:

q=*:*&group=true&group.field=user.uniqueId_s&group.main=true&group.format=grouped&sfield=user.location_p&pt=48.20927,16.3728&sort=geodist()
> asc


The exception thrown is:

Caused by: org.apache.solr.common.SolrException: Unweighted use of sort
> geodist(latlon(user.location_p),48.20927,16.3728)
> at
> org.apache.solr.search.function.ValueSource$1.newComparator(ValueSource.java:106)
> at org.apache.lucene.search.SortField.getComparator(SortField.java:413)
> at
> org.apache.lucene.search.grouping.AbstractFirstPassGroupingCollector.(AbstractFirstPassGroupingCollector.java:81)
> at
> org.apache.lucene.search.grouping.TermFirstPassGroupingCollector.(TermFirstPassGroupingCollector.java:56)
> at
> org.apache.solr.search.Grouping$CommandField.createFirstPassCollector(Grouping.java:587)
> at org.apache.solr.search.Grouping.execute(Grouping.java:256)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:237)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
> at
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:140)
> ... 39 more


Any ideas how to fix this or work around this error for now? I'd really like
to move from the trunk to the stable 3.3.0 release and this is the only
problem currently keeping me from doing so.

Cheers,

Thomas


How do I compute and store a field?

2011-07-03 Thread Gabriele Kahlout
Hello,

I'm trying to add a field that counts the number of terms in a document to
my schema. So far I've been computing this value at query-time. Is there how
I could compute this once only and store the field?

final SolrIndexSearcher searcher = request.getSearcher();
final SolrIndexReader reader = searcher.getReader();
final String content = "content";

final byte[] norms = reader.norms(content);
final int[] docLengths;
if (norms == null) {
docLengths = null;
} else {
docLengths = new int[norms.length];
int i = 0;
for (byte b : norms) {

float docNorm = searcher.getSimilarity().decodeNormValue(b);
int docLength = 0;
if (docNorm != 0) {
docLength = (int) (1 / docNorm); //reciprocal
}
docLengths[i++] = docLength;
}
...
 final NumericField docLenNormField = new
NumericField(TestQueryResponseWriter.DOC_LENGHT);
 docLenNormField.setIntValue(docLengths[id]);
 doc.add(docLenNormField);

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: How do I add a custom field?

2011-07-03 Thread Michael Sokolov
You'll need to index the field.  I would think you would want to 
index/store the field along with the associated document, in which case 
you'll have to reindex the documents as well - there's no single-field 
update capability in Lucene (yet?).


-Mike

On 7/3/2011 1:09 PM, Gabriele Kahlout wrote:

Is there how I can compute and add the field to all indexed documents
without re-indexing? MyField counts the number of terms per document (unique
word count).

On Sun, Jul 3, 2011 at 12:24 PM, lee carroll
wrote:


Hi Gabriele,
Did you index any docs with your new field ?

The results will just bring back docs and what fields they have. They won't
bring back "null" fields just because they are in your schema. Lucene
is schema-less.
Solr adds the schema to make it nice to administer and very powerful to
use.





On 3 July 2011 11:01, Gabriele Kahlout  wrote:

Hello,

I want to have an additional  field that appears for every document in
search results. I understand that I should do this by adding the field to
the schema.xml, so I add:

Then I restart Solr (so that I loads the new schema.xml) and make a query
specifying that it should return myField too, but it doesn't. Will it do
only for newly indexed documents? Am I missing something?

--
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧

time(x)

<  Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the

email

does not contain a valid code then the email is not received. A valid

code

starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).








Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-07-03 Thread Shawn Heisey

On 7/2/2011 12:34 PM, Yonik Seeley wrote:

OK, I tried a quick test of 1.4.1 vs 3x on optimized indexes
(unoptimized had different numbers of segments so I didn't try that).
3x (as of today) was 28% faster at a large filter query (300 terms in
one  big disjunction, with each term matching ~1000 docs).


A lot of the terms used in my filter queries may match hundreds of 
thousands or even millions of documents.  The largest search group 
(sg:stdp) matches about 1.4 million out of 9.5 million docs on each 
shard, and is probably present in most filter queries.


Right now I have the default termIndexInterval of 128, and a 
setTermIndexDivisor of 8.  I think this probably has the same memory 
footprint as a termIndexInterval of 1024, but because it can do seeks in 
the tii file (taking good advantage of disk cache) before it ultimately 
seeks in the tis file, there are probably fewer seeks.  My warm time is 
slightly better than it was with the interval at 1024, and my average 
query speed hasn't changed much.  I am going to try an interval of 64 
and a divisor of 16.


I'm interested in other performance enhancing ideas that don't involve 
tweaking tons of options all at the same time.  I think my best bet for 
performance is adding more memory, of course.


Shawn



Custom Cache cleared after a commit?

2011-07-03 Thread arian487
I know the queryResultCache and stuff live only so long as a commit happens
but I'm wondering if the custom caches are like this as well?  I'd actually
rather have a custom cache which is not cleared at all.  I want to give the
elements of this Cache a 6 hour TTL (or some time frame) but I never want it
to clear on a commit.  Is this possible using SolrCache?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Cache-cleared-after-a-commit-tp3136345p3136345.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom Cache cleared after a commit?

2011-07-03 Thread Yonik Seeley
On Sun, Jul 3, 2011 at 10:52 PM, arian487  wrote:
> I know the queryResultCache and stuff live only so long as a commit happens
> but I'm wondering if the custom caches are like this as well?  I'd actually
> rather have a custom cache which is not cleared at all.

That's not currently possible.
The nature of Solr's caches are that they are completely transparent -
it doesn't matter if a cache is used or not, the response should
always be the same.  This is analogous to caching the fact that 2*2 =
4.  Put another way, Solr's caches are only for increasing request
throughput, and should not affect what response a client receives.

-Yonik
http://www.lucidimagination.com


how to improve query result time.

2011-07-03 Thread Jason, Kim
Hi All
I have complex phrase queries including wildcard.
(ex. q="conn* pho*"~2 OR "inter* pho*"~2 OR ...)
That takes long query result time.
I tried reindex after changing termIndexInterval to 8 for reduce the query
result time through more loading term index info.
I thought if I do so query result time will be faster.
But it wasn't.
I doubt searching for .frq/.prx spends more time...
Any ideas for impoving query result time?

I'm using Solr 1.4 and schema.xml is below.


















Thanks in advance

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-improve-query-result-time-tp3136554p3136554.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom Cache cleared after a commit?

2011-07-03 Thread arian487
I guess I'll have to use something other then SolrCache to get what I want
then.  Or I could use SolrCache and just change the code (I've already done
so much of this anwyways...).  Anyways thanks for the reply.  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Cache-cleared-after-a-commit-tp3136345p3136580.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MergerFactor and MaxMergerDocs effecting num of segments created

2011-07-03 Thread Romi
Shawn when i reindex data using full-import i got:
*_0.fdt 3310
_0.fdx  23
_0.frq  857
_0.nrm  31
_0.prx  1748
_0.tis  350
_1.fdt  3310
_1.fdx  23
_1.fnm  1
_1.frq  857
_1.nrm  31
_1.prx  1748
_1.tii  5
_1.tis  350
segments.gen1
segments_3  1*

Where all  _1  marked as archived(A)

And when i run again full import(for testing ) i got _1 and 2_ files where
all 2_ marked as archive. What does it mean.
and the problem i am not getting is while i am doing full import which
deletes the old indexes and creates new than why i m getting the old one
again.




-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/MergerFactor-and-MaxMergerDocs-effecting-num-of-segments-created-tp3128897p3136664.html
Sent from the Solr - User mailing list archive at Nabble.com.