Re: DIH: alternative approach to deltaQuery

2010-09-20 Thread Lukas Kahwe Smith
Hi,

ok since it didnt seem like there was interest to document this approach on the 
wiki i have simply documented it on my blog:
http://pooteeweet.org/blog/1827

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





Re: DIH: alternative approach to deltaQuery

2010-09-20 Thread Paul Dhaliwal
Thank you very much Shawn.

Paul


On Fri, Sep 17, 2010 at 12:11 PM, Shawn Heisey elyog...@elyograg.orgwrote:

  On 9/17/2010 3:01 AM, Paul Dhaliwal wrote:

 Another feature missing in DIH is ability to pass parameters into your
 queries. If one could pass a named or positional parameter for an entity
 query, it will give them lot of freedom to optimize their delta or full
 load
 queries. One can even get creative with entity and delta queries that can
 take ranges and pass timestamps that depend on external sources.


 Paul,

 If I understand what you are saying, this ability already exists.  I am
 using it with Solr 1.4.1.  I sent some detailed information on how to do it
 to the list early last month:

 http://www.mail-archive.com/solr-user@lucene.apache.org/msg40466.html

 Shawn




Help: java.lang.OutOfMemoryError: PermGen space

2010-09-20 Thread Markus.Rietzler
the second time we had the error

java.lang.OutOfMemoryError: PermGen space

and solr stopped responding.

we use the default jetty installation with jdk1.6.0_21. after the last
time i tried to set the garbage collector right
these are my settings:

-D64 -server -Xms892m -Xmx2048m -XX:+UseConcMarkSweepGC
-XX:+UseParNewGC -XX:-HeapDumpOnOutOfMemoryError
-XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled

as far as i thought, -XX:+CMSClassUnloadingEnabled
-XX:+CMSPermGenSweepingEnabled should also cleanup the PermGen space.

what can we do? 

ok, at the moment solr is not stopped and is running all time. maybe
we should do a regular (daily) restart, then the problem should be
fixed. but how can we adjust the garbage settings, so that the PermGen
space is not running out of space...


markus


Re: Help: java.lang.OutOfMemoryError: PermGen space

2010-09-20 Thread Peter Karich
see
http://stackoverflow.com/questions/88235/how-to-deal-with-java-lang-outofmemoryerror-permgen-space-error

and the links there. There seems to be no good solution :-/
The only reliable solution is restart, before you haven't enough
permgenspace (use jvisualvm to monitor)
And try to increase -XX:MaxPermSize to make the restart interval longer
or using jrebel or sth. like that should probably help too.

Regards,
Peter.

 the second time we had the error

   java.lang.OutOfMemoryError: PermGen space

 and solr stopped responding.

 we use the default jetty installation with jdk1.6.0_21. after the last
 time i tried to set the garbage collector right
 these are my settings:

   -D64 -server -Xms892m -Xmx2048m -XX:+UseConcMarkSweepGC
 -XX:+UseParNewGC -XX:-HeapDumpOnOutOfMemoryError
 -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled

 as far as i thought, -XX:+CMSClassUnloadingEnabled
 -XX:+CMSPermGenSweepingEnabled should also cleanup the PermGen space.

 what can we do? 

 ok, at the moment solr is not stopped and is running all time. maybe
 we should do a regular (daily) restart, then the problem should be
 fixed. but how can we adjust the garbage settings, so that the PermGen
 space is not running out of space...


 markus

   


-- 
http://jetwick.com twitter search prototype



Solr UIMA integration

2010-09-20 Thread Tommaso Teofili
Hi all,
I am working on integrating Apache UIMA as un UpdateRequestProcessor for
Apache Solr and I am now at the first working snapshot.
I put the code on GoogleCode [1] and you can take a look at the tutorial
[2].

I would be glad to donate it to the Apache Solr project, as I think it could
be a useful module to trigger automatic content extraction while indexing
documents.

At the moment the UIMAUpdateRequestProcessor base implementation can
automatically extract document's sentences, language, keywords, concepts and
named entities using Apache UIMA's HMMTagger, OpenCalaisAnnotator and
AlchemyAPIAnnotator components (but it can be easily expanded).

Any feedback is welcome.
Have a nice day.
Tommaso

[1] : http://code.google.com/p/solr-uima/
[2] : http://code.google.com/p/solr-uima/wiki/5MinutesTutorial


Restrict possible results based on relational information

2010-09-20 Thread Stefan Matheis
Hi List,

this is my first message on this list, so if there's something
missing/incorrect, please let me know :)

the current problem, described in short words followed by an short example,
is the following one:

users can send privates messages, the selection of recipients is done via
auto-complete. therefore we need to restrict the possible results based on
the users confirmed contacts - but i have absolutely no idea how to do that
:/ Add all confirmed contacts to the index, and use it like a type of
relation? pass the list of confirmed contacts together with the query?

let's say we have John Doe which creates a new message. typing doe
should suggest Jane Doe, Thomas Doe - but not Another Doe, which is
also a user, but none of his confirmed Contacts. Maybe we get also John
Doe as possible match, but that should be okay in the first place - if we
could exclude the user himself also, that's of course better.

every user-record has an id, additional fields for firstname and lastname.
confirmed contacts are simply explained records with field from:user-id
to:user-id, actually with no additional information about type of
relationship or something. but nothing of this relationship-information is
currently submitted to the solr-index.

if you need more information to answer this not-very-concrete question (and
i'm sure, i've missed some relevant info) just ask, please :)

Regards
Stefan


Re: Restrict possible results based on relational information

2010-09-20 Thread Chantal Ackermann
hi Stefan

 users can send privates messages, the selection of recipients is done via
 auto-complete. therefore we need to restrict the possible results based on
 the users confirmed contacts - but i have absolutely no idea how to do that
 :/ Add all confirmed contacts to the index, and use it like a type of
 relation? pass the list of confirmed contacts together with the query?

This does not sound like a search query because:
1. you know the user
2. you know his/her list of confirmed contacts

If both statements are true, the list of confirmed contacts should be
accessible via JSON-URL call so that you can load it into a autocomplete
dropdown.
SOLR needs not be involved in this case (but you can of course store the
list of confirmed contacts in a multivalued field per user if you need
it for other searches or facetting).

Cheers,
Chantal



Solr Analyzer results before the actual query.

2010-09-20 Thread zackko

Hi to all the Forum from a new subscriber,

I’m working on the Server Side Search solution of the Company when I’m
currently employed with. I have a problem at the moment: When I will submit
a search to Solr I want to see the “Analyzer results”, with all the Filter
applied to it as defined into the types.xml, of the search terms (Query)
submitted to the Analyzer itself. The result of the Analyzer I want to have
displayed BEFORE the actual search will be performed so I can decide at this
point if I can run the proper search or leave the user with no results on
the search performed.
 
The problem is more less described in that issue
https://issues.apache.org/jira/browse/SOLR-261. In summary is that possible
to have the Analyzer results (in code) before running the actual Sorl
search?

I'm quite new to Solr so maybe this issue has been already discussed in
another thread but I'm unable to find it at the moment, so if anybody has a
any clue on how to do that please any suggestion will be more than welcome.
 
Thanks very much in advance for your answer.
 
Best wishes.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Analyzer-results-before-the-actual-query-tp1528692p1528692.html
Sent from the Solr - User mailing list archive at Nabble.com.


NGram and word boundaries?

2010-09-20 Thread Harry Hochheiser
I've got a question regarding NGramFilterFactory. It seems to work
very well, but I've had trouble getting it to work with other filters.


Specifically, if I have an index analyzer that uses a
StandardTokenizerFactory to tokenize and follows it up with an
NGramFilterFactory, it does a fine job of handling ngrams, but it
doesn't respect word boundaries: queries will match across whitespace.
Using a modified example of the monitor.xml file for the example, If I
have a field containing the text Dell Widescreen UltraSharp 3007WFP,
and I provide the search query en U, it will match.

I'd like to have the NGramFilterFactory match only _within_ words: how
can I go about doing that? I'd like to avoid having to manually
pre-process the query.

I can provide detailed schema and examples is they'd help..

thanks!
-harry


Re: Solr for statistical data

2010-09-20 Thread Kjetil Ødegaard
On Thu, Sep 16, 2010 at 11:48 AM, Peter Karich peat...@yahoo.de wrote:

 Hi Kjetil,

 is this custom component (which performes groub by + calcs stats)
 somewhere available?
 I would like to do something similar. Would you mind to share if it
 isn't already available?

 The grouping stuff sounds similar to
 https://issues.apache.org/jira/browse/SOLR-236

 where you can have mem problems too ;-) or see:
 https://issues.apache.org/jira/browse/SOLR-1682


Thanks for the links! These patches seem to provide somewhat similar
functionality, I'll investigate if they're implemented in a similar way too.

We've developed this component for a client, so while I'd like to share it I
can't make any promises. Sorry.


  Any tips or similar experiences?

 you want to decrease memory usage?


Yes. Specifically, I would like to keep the heap at 4 GB. Unfortunately I'm
still seeing some OutOfMemoryErrors so I might have to up the heap size
again.

I guess what I'm really wondering is if there's a way to keep memory use
down, while at the same time not sacrificing the performance of our queries.
The queries have to run through all values for a field in order to calculate
the sum, so it's not enough to just cache a few values.

The code which fetches values from the index uses
FieldCache.DEFAULT.getStringIndex for a field, and then indexes like this:

FieldType fieldType = searcher.getSchema().getFieldType(fieldName);
fieldType.indexedToReadable(stringIndex.lookup[stringIndex.order[documentId]]);

Is there a better way to do this? Thanks.


---Kjetil


Re: Searching solr with a two word query

2010-09-20 Thread noel
Here is my raw query:
q=opening+excellent+AND+presentation_id%3A294+AND+type%3Ablobversion=1.3json.nl=maprows=10start=0wt=xmlhl=truehl.fl=texthl.simple.pre=span+class%3Dhlhl.simple.post=%2Fspanhl.fragsize=0hl.mergeContiguous=falsedebugQuery=on

and here is what I get on the debugQuery:
lst name=debug
−
str name=rawquerystring
opening excellent AND presentation_id:294 AND type:blob
/str
−
str name=querystring
opening excellent AND presentation_id:294 AND type:blob
/str
−
str name=parsedquery
all_text:open +all_text:excel +presentation_id:294 +type:blob
/str
−
str name=parsedquery_toString
all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob
/str
−
lst name=explain
−
str name=1435675blob

3.1143723 = (MATCH) sum of:
  0.46052343 = (MATCH) weight(all_text:open in 4457), product of:
0.5531408 = queryWeight(all_text:open), product of:
  5.3283896 = idf(docFreq=162, maxDocs=12359)
  0.10381013 = queryNorm
0.8325609 = (MATCH) fieldWeight(all_text:open in 4457), product of:
  1.0 = tf(termFreq(all_text:open)=1)
  5.3283896 = idf(docFreq=162, maxDocs=12359)
  0.15625 = fieldNorm(field=all_text, doc=4457)
  0.74662465 = (MATCH) weight(all_text:excel in 4457), product of:
0.7043054 = queryWeight(all_text:excel), product of:
  6.7845535 = idf(docFreq=37, maxDocs=12359)
  0.10381013 = queryNorm
1.0600865 = (MATCH) fieldWeight(all_text:excel in 4457), product of:
  1.0 = tf(termFreq(all_text:excel)=1)
  6.7845535 = idf(docFreq=37, maxDocs=12359)
  0.15625 = fieldNorm(field=all_text, doc=4457)
  1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4457), product of:
0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
  4.1625586 = idf(docFreq=522, maxDocs=12359)
  0.10381013 = queryNorm
4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4457), product of:
  1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
  4.1625586 = idf(docFreq=522, maxDocs=12359)
  1.0 = fieldNorm(field=presentation_id, doc=4457)
  0.108517066 = (MATCH) weight(type:blob in 4457), product of:
0.10613751 = queryWeight(type:blob), product of:
  1.0224196 = idf(docFreq=12084, maxDocs=12359)
  0.10381013 = queryNorm
1.0224196 = (MATCH) fieldWeight(type:blob in 4457), product of:
  1.0 = tf(termFreq(type:blob)=1)
  1.0224196 = idf(docFreq=12084, maxDocs=12359)
  1.0 = fieldNorm(field=type, doc=4457)
/str
−
str name=1436129blob

2.06395 = (MATCH) product of:
  2.7519336 = (MATCH) sum of:
0.84470934 = (MATCH) weight(all_text:excel in 4911), product of:
  0.7043054 = queryWeight(all_text:excel), product of:
6.7845535 = idf(docFreq=37, maxDocs=12359)
0.10381013 = queryNorm
  1.199351 = (MATCH) fieldWeight(all_text:excel in 4911), product of:
1.4142135 = tf(termFreq(all_text:excel)=2)
6.7845535 = idf(docFreq=37, maxDocs=12359)
0.125 = fieldNorm(field=all_text, doc=4911)
1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4911), product of:
  0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
4.1625586 = idf(docFreq=522, maxDocs=12359)
0.10381013 = queryNorm
  4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4911), product 
of:
1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
4.1625586 = idf(docFreq=522, maxDocs=12359)
1.0 = fieldNorm(field=presentation_id, doc=4911)
0.108517066 = (MATCH) weight(type:blob in 4911), product of:
  0.10613751 = queryWeight(type:blob), product of:
1.0224196 = idf(docFreq=12084, maxDocs=12359)
0.10381013 = queryNorm
  1.0224196 = (MATCH) fieldWeight(type:blob in 4911), product of:
1.0 = tf(termFreq(type:blob)=1)
1.0224196 = idf(docFreq=12084, maxDocs=12359)
1.0 = fieldNorm(field=type, doc=4911)
  0.75 = coord(3/4)
/str
−
str name=1435686blob

1.9903867 = (MATCH) product of:
  2.653849 = (MATCH) sum of:
0.74662465 = (MATCH) weight(all_text:excel in 4468), product of:
  0.7043054 = queryWeight(all_text:excel), product of:
6.7845535 = idf(docFreq=37, maxDocs=12359)
0.10381013 = queryNorm
  1.0600865 = (MATCH) fieldWeight(all_text:excel in 4468), product of:
1.0 = tf(termFreq(all_text:excel)=1)
6.7845535 = idf(docFreq=37, maxDocs=12359)
0.15625 = fieldNorm(field=all_text, doc=4468)
1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4468), product of:
  0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
4.1625586 = idf(docFreq=522, maxDocs=12359)
0.10381013 = queryNorm
  4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4468), product 
of:
1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
4.1625586 = idf(docFreq=522, maxDocs=12359)
1.0 = fieldNorm(field=presentation_id, doc=4468)
0.108517066 = (MATCH) weight(type:blob in 4468), product of:
  0.10613751 = queryWeight(type:blob), product of:
1.0224196 = 

SolrCloud new....

2010-09-20 Thread satya swaroop
Hi all,
I  am having 4 instances of solr in 4 systems.Each system has a
single instance of solr.. I want the result from all these servers. I came
to know using of solrcloud. I read about it and worked on the example and it
was working as given in wiki.
I am using solr 1.4 and apache tomcat. In order to implement cloud in the
solr trunk wat procedure should be followed.
1)Should i copy the libraries from cloud to trunk???
2)should i keep the cloud module in every system???
3) I am not using any cores in the solr. It is a single solr in every
system.can solrcloud support it??
4) the example is given in jetty.Is it the same way to make it in tomcat???

Regards,
satya


Re: Solr for statistical data

2010-09-20 Thread Thomas Joiner
I don't know if this thread might help with your problems any, but it might
give some pointers:

http://lucene.472066.n3.nabble.com/Tuning-Solr-caches-with-high-commit-rates-NRT-td1461275.html

http://lucene.472066.n3.nabble.com/Tuning-Solr-caches-with-high-commit-rates-NRT-td1461275.html
--Thomas

On Mon, Sep 20, 2010 at 7:58 AM, Kjetil Ødegaard
kjetil.odega...@gmail.comwrote:

 On Thu, Sep 16, 2010 at 11:48 AM, Peter Karich peat...@yahoo.de wrote:

  Hi Kjetil,
 
  is this custom component (which performes groub by + calcs stats)
  somewhere available?
  I would like to do something similar. Would you mind to share if it
  isn't already available?
 
  The grouping stuff sounds similar to
  https://issues.apache.org/jira/browse/SOLR-236
 
  where you can have mem problems too ;-) or see:
  https://issues.apache.org/jira/browse/SOLR-1682
 
 
 Thanks for the links! These patches seem to provide somewhat similar
 functionality, I'll investigate if they're implemented in a similar way
 too.

 We've developed this component for a client, so while I'd like to share it
 I
 can't make any promises. Sorry.


   Any tips or similar experiences?
 
  you want to decrease memory usage?


 Yes. Specifically, I would like to keep the heap at 4 GB. Unfortunately I'm
 still seeing some OutOfMemoryErrors so I might have to up the heap size
 again.

 I guess what I'm really wondering is if there's a way to keep memory use
 down, while at the same time not sacrificing the performance of our
 queries.
 The queries have to run through all values for a field in order to
 calculate
 the sum, so it's not enough to just cache a few values.

 The code which fetches values from the index uses
 FieldCache.DEFAULT.getStringIndex for a field, and then indexes like this:

 FieldType fieldType = searcher.getSchema().getFieldType(fieldName);

 fieldType.indexedToReadable(stringIndex.lookup[stringIndex.order[documentId]]);

 Is there a better way to do this? Thanks.


 ---Kjetil



Re: Calculating distances in Solr using longitude latitude

2010-09-20 Thread PeterKerk

Hi Dennis,

Good suggestion, but I see that most of that is Solr 4.0 functionality,
which has not been released yet.
How can I still use the longitude latitude functionality (LatLonType)?

Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Calculating-distances-in-Solr-using-longitude-latitude-tp1524297p1529097.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching solr with a two word query

2010-09-20 Thread Erick Erickson
Here's an excellent description of the Lucene query operators and how they
differ from strict
boolean logic: http://www.gossamer-threads.com/lists/lucene/java-user/47928

http://www.gossamer-threads.com/lists/lucene/java-user/47928But the short
form is that (and boy, doesn't the fact that the URL escaping spaces
as '+', which is also a Lucene operator make looking at these interesting),
is that the
first term is essentially a SHOULD clause in a Lucene BooleanQuery and is
matching your docs all by itself.

HTH
Erick

On Mon, Sep 20, 2010 at 8:58 AM, n...@frameweld.com wrote:

 Here is my raw query:
 q=opening+excellent+AND+presentation_id%3A294+AND+type%3Ablobversion=1.3
 json.nl
 =maprows=10start=0wt=xmlhl=truehl.fl=texthl.simple.pre=span+class%3Dhlhl.simple.post=%2Fspanhl.fragsize=0hl.mergeContiguous=falsedebugQuery=on

 and here is what I get on the debugQuery:
 lst name=debug
 −
 str name=rawquerystring
 opening excellent AND presentation_id:294 AND type:blob
 /str
 −
 str name=querystring
 opening excellent AND presentation_id:294 AND type:blob
 /str
 −
 str name=parsedquery
 all_text:open +all_text:excel +presentation_id:294 +type:blob
 /str
 −
 str name=parsedquery_toString
 all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob
 /str
 −
 lst name=explain
 −
 str name=1435675blob

 3.1143723 = (MATCH) sum of:
  0.46052343 = (MATCH) weight(all_text:open in 4457), product of:
0.5531408 = queryWeight(all_text:open), product of:
  5.3283896 = idf(docFreq=162, maxDocs=12359)
  0.10381013 = queryNorm
0.8325609 = (MATCH) fieldWeight(all_text:open in 4457), product of:
  1.0 = tf(termFreq(all_text:open)=1)
  5.3283896 = idf(docFreq=162, maxDocs=12359)
  0.15625 = fieldNorm(field=all_text, doc=4457)
  0.74662465 = (MATCH) weight(all_text:excel in 4457), product of:
0.7043054 = queryWeight(all_text:excel), product of:
  6.7845535 = idf(docFreq=37, maxDocs=12359)
  0.10381013 = queryNorm
1.0600865 = (MATCH) fieldWeight(all_text:excel in 4457), product of:
  1.0 = tf(termFreq(all_text:excel)=1)
  6.7845535 = idf(docFreq=37, maxDocs=12359)
  0.15625 = fieldNorm(field=all_text, doc=4457)
  1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4457), product of:
0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
  4.1625586 = idf(docFreq=522, maxDocs=12359)
  0.10381013 = queryNorm
4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4457), product
 of:
  1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
  4.1625586 = idf(docFreq=522, maxDocs=12359)
  1.0 = fieldNorm(field=presentation_id, doc=4457)
  0.108517066 = (MATCH) weight(type:blob in 4457), product of:
0.10613751 = queryWeight(type:blob), product of:
  1.0224196 = idf(docFreq=12084, maxDocs=12359)
  0.10381013 = queryNorm
1.0224196 = (MATCH) fieldWeight(type:blob in 4457), product of:
  1.0 = tf(termFreq(type:blob)=1)
  1.0224196 = idf(docFreq=12084, maxDocs=12359)
  1.0 = fieldNorm(field=type, doc=4457)
 /str
 −
 str name=1436129blob

 2.06395 = (MATCH) product of:
  2.7519336 = (MATCH) sum of:
0.84470934 = (MATCH) weight(all_text:excel in 4911), product of:
  0.7043054 = queryWeight(all_text:excel), product of:
6.7845535 = idf(docFreq=37, maxDocs=12359)
0.10381013 = queryNorm
  1.199351 = (MATCH) fieldWeight(all_text:excel in 4911), product of:
1.4142135 = tf(termFreq(all_text:excel)=2)
6.7845535 = idf(docFreq=37, maxDocs=12359)
0.125 = fieldNorm(field=all_text, doc=4911)
1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4911), product of:
  0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
4.1625586 = idf(docFreq=522, maxDocs=12359)
0.10381013 = queryNorm
  4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4911),
 product of:
1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
4.1625586 = idf(docFreq=522, maxDocs=12359)
1.0 = fieldNorm(field=presentation_id, doc=4911)
0.108517066 = (MATCH) weight(type:blob in 4911), product of:
  0.10613751 = queryWeight(type:blob), product of:
1.0224196 = idf(docFreq=12084, maxDocs=12359)
0.10381013 = queryNorm
  1.0224196 = (MATCH) fieldWeight(type:blob in 4911), product of:
1.0 = tf(termFreq(type:blob)=1)
1.0224196 = idf(docFreq=12084, maxDocs=12359)
1.0 = fieldNorm(field=type, doc=4911)
  0.75 = coord(3/4)
 /str
 −
 str name=1435686blob

 1.9903867 = (MATCH) product of:
  2.653849 = (MATCH) sum of:
0.74662465 = (MATCH) weight(all_text:excel in 4468), product of:
  0.7043054 = queryWeight(all_text:excel), product of:
6.7845535 = idf(docFreq=37, maxDocs=12359)
0.10381013 = queryNorm
  1.0600865 = (MATCH) fieldWeight(all_text:excel in 4468), product of:
1.0 = tf(termFreq(all_text:excel)=1)
6.7845535 = idf(docFreq=37, maxDocs=12359)
0.15625 = fieldNorm(field=all_text, 

Re: Solr UIMA integration

2010-09-20 Thread Jan Høydahl / Cominvent
Hi Tommaso,

Really cool what you've done. Looking forward to testing it, and I'm sure it's 
a welcome contribution to Solr.
You can easily contribute your code by opening a JIRA issue and attaching a 
patch file.

BTW
Have you considered making the output field names configurable on a per 
instance basis? It could be done as follows:
processor class=org.apache.solr.uima.processor.UIMAProcessorFactory
  str name=concept_fieldconcept/str
  str name=language_fieldconcept/str
  str name=keyword_fieldconcept/str
  ...
/processor

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 20. sep. 2010, at 12.35, Tommaso Teofili wrote:

 Hi all,
 I am working on integrating Apache UIMA as un UpdateRequestProcessor for
 Apache Solr and I am now at the first working snapshot.
 I put the code on GoogleCode [1] and you can take a look at the tutorial
 [2].
 
 I would be glad to donate it to the Apache Solr project, as I think it could
 be a useful module to trigger automatic content extraction while indexing
 documents.
 
 At the moment the UIMAUpdateRequestProcessor base implementation can
 automatically extract document's sentences, language, keywords, concepts and
 named entities using Apache UIMA's HMMTagger, OpenCalaisAnnotator and
 AlchemyAPIAnnotator components (but it can be easily expanded).
 
 Any feedback is welcome.
 Have a nice day.
 Tommaso
 
 [1] : http://code.google.com/p/solr-uima/
 [2] : http://code.google.com/p/solr-uima/wiki/5MinutesTutorial



Re: Restrict possible results based on relational information

2010-09-20 Thread Jan Høydahl / Cominvent
Hi,

You could simply create an autocomplete Solr Core with a simple schema 
consisting of id, from, to:
Let the fieldType of from be String, and in the fieldType of to you can use 
StandardTokenizer, WordDelimiterFilter and EdgeNGramFilter.

add
  doc
field name=idjohn@mycompany.com-jane.doe@mycompany.com/field
field name=fromjohn@mycompany.com/field
field name=toJane Doe (jane@mycompany.com)/field
  /doc
  doc
field name=idjohn@mycompany.com-thomas.doe@mycompany.com/field
field name=fromjohn@mycompany.com/field
field name=toThomas Doe (thomas@mycompany.com)/field
  /doc
  doc
field name=idpeter@mycompany.com-another.doe@mycompany.com/field
field name=frompeter@mycompany.com/field
field name=toAnother Doe (another@mycompany.com)/field
  /doc
/add

Now, if your autocomplete query is like this:
wt=jsonfl=toqf=from:john@mycompany.comq={!q.op=AND df=to}do

Your response will now be a list of valid recepients where the from field is 
current user. By using EdgeNGramFilter in the to field, you get the effect of 
an automatic wildcard search since John Doe will be indexed as (conceptually) 
J Jo Joh John D Do Doe

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 20. sep. 2010, at 12.36, Stefan Matheis wrote:

 Hi List,
 
 this is my first message on this list, so if there's something
 missing/incorrect, please let me know :)
 
 the current problem, described in short words followed by an short example,
 is the following one:
 
 users can send privates messages, the selection of recipients is done via
 auto-complete. therefore we need to restrict the possible results based on
 the users confirmed contacts - but i have absolutely no idea how to do that
 :/ Add all confirmed contacts to the index, and use it like a type of
 relation? pass the list of confirmed contacts together with the query?
 
 let's say we have John Doe which creates a new message. typing doe
 should suggest Jane Doe, Thomas Doe - but not Another Doe, which is
 also a user, but none of his confirmed Contacts. Maybe we get also John
 Doe as possible match, but that should be okay in the first place - if we
 could exclude the user himself also, that's of course better.
 
 every user-record has an id, additional fields for firstname and lastname.
 confirmed contacts are simply explained records with field from:user-id
 to:user-id, actually with no additional information about type of
 relationship or something. but nothing of this relationship-information is
 currently submitted to the solr-index.
 
 if you need more information to answer this not-very-concrete question (and
 i'm sure, i've missed some relevant info) just ask, please :)
 
 Regards
 Stefan



Re: Solr UIMA integration

2010-09-20 Thread Dennis Gearon
Looks like a great scraping engine technology :-)
Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/20/10, Tommaso Teofili tommaso.teof...@gmail.com wrote:

 From: Tommaso Teofili tommaso.teof...@gmail.com
 Subject: Solr UIMA integration
 To: solr-user@lucene.apache.org
 Date: Monday, September 20, 2010, 3:35 AM
 Hi all,
 I am working on integrating Apache UIMA as un
 UpdateRequestProcessor for
 Apache Solr and I am now at the first working snapshot.
 I put the code on GoogleCode [1] and you can take a look at
 the tutorial
 [2].
 
 I would be glad to donate it to the Apache Solr project, as
 I think it could
 be a useful module to trigger automatic content extraction
 while indexing
 documents.
 
 At the moment the UIMAUpdateRequestProcessor base
 implementation can
 automatically extract document's sentences, language,
 keywords, concepts and
 named entities using Apache UIMA's HMMTagger,
 OpenCalaisAnnotator and
 AlchemyAPIAnnotator components (but it can be easily
 expanded).
 
 Any feedback is welcome.
 Have a nice day.
 Tommaso
 
 [1] : http://code.google.com/p/solr-uima/
 [2] : http://code.google.com/p/solr-uima/wiki/5MinutesTutorial
 


Re: Calculating distances in Solr using longitude latitude

2010-09-20 Thread Dennis Gearon
Hmmm,
 I am about to put a engineer on our search engine requirements with the 
assumption that latitude/longitude is available in the current release of Solr, 
(not knowing what that is). 

 I have been partitioning the whole Solr thing to him,except enough info 
for me to understand and interface to his work. So, I don't have that 
answer.

 Can someone else answer him?


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/20/10, PeterKerk vettepa...@hotmail.com wrote:

 From: PeterKerk vettepa...@hotmail.com
 Subject: Re: Calculating distances in Solr using longitude latitude
 To: solr-user@lucene.apache.org
 Date: Monday, September 20, 2010, 6:53 AM
 
 Hi Dennis,
 
 Good suggestion, but I see that most of that is Solr 4.0
 functionality,
 which has not been released yet.
 How can I still use the longitude latitude functionality
 (LatLonType)?
 
 Thanks!
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Calculating-distances-in-Solr-using-longitude-latitude-tp1524297p1529097.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.
 


Re: Solr starting problem

2010-09-20 Thread Erick Erickson
Are you trying to implement custom code or is this a stock release?
Because if you're trying to just move a stock release over, it'd be much
simpler to just unpack the distribution (for Linux) on the linux machine
and go. It might be worth doing anyway just to compare the differences
to see what's causing your problem.

But it looks like you're problem is in you Jetty configuration. I'm really
guessing that you can't start your Jetty servlet at all

HTH
Erick

On Mon, Sep 20, 2010 at 11:19 AM, Yavuz Selim YILMAZ 
yvzslmyilm...@gmail.com wrote:

 I use solr in windows without any problem, I 'm trying to run solr in
 linux,
 ( copy all files from windows to linux ), but I'm given exceptions when I
 try to start solr (java -jar start.jar)

 java.lang.ClassNotFoundException: org.mortbay.xml.xmlConfiguration
   at java.net.URLClassLoader.findClass(URLClassLoader.java:378)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:570)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:502)
   at jorg.mortbay.start.Main.start(Main.java:534)
   at jorg.mortbay.start.Main.start(Main.java:441)
   at jorg.mortbay.start.Main.Main(Main.java:119)

 I controlled all jar files, problem looks releted with jetty, but I can't
 find any solution.

 Any ideas?

 Thnx.

 --

 Yavuz Selim YILMAZ



Re: Searching solr with a two word query

2010-09-20 Thread Erick Erickson
I'm missing what you really want out of your query, your
phrase either word as a single result just isn't connecting
in my grey matter.. Could you give some example inputs and
outputs that demonstrates what you want?

Best
Erick

On Mon, Sep 20, 2010 at 11:41 AM, n...@frameweld.com wrote:

 I noticed that my defaultOperator is OR, and that does have an effect on
 what does come up. If I were to change that to and, it's an exact match to
 my query, but Im would like similar matches with either word as a single
 result. Is there another value I can use? Or maybe I should use another
 query parser?

 Thanks.
 - Noel

 -Original Message-
 From: Erick Erickson erickerick...@gmail.com
 Sent: Monday, September 20, 2010 10:05am
 To: solr-user@lucene.apache.org
 Subject: Re: Searching solr with a two word query

 Here's an excellent description of the Lucene query operators and how they
 differ from strict
 boolean logic:
 http://www.gossamer-threads.com/lists/lucene/java-user/47928

 http://www.gossamer-threads.com/lists/lucene/java-user/47928But the
 short
 form is that (and boy, doesn't the fact that the URL escaping spaces
 as '+', which is also a Lucene operator make looking at these interesting),
 is that the
 first term is essentially a SHOULD clause in a Lucene BooleanQuery and is
 matching your docs all by itself.

 HTH
 Erick

 On Mon, Sep 20, 2010 at 8:58 AM, n...@frameweld.com wrote:

  Here is my raw query:
 
 q=opening+excellent+AND+presentation_id%3A294+AND+type%3Ablobversion=1.3
  json.nl
 
 =maprows=10start=0wt=xmlhl=truehl.fl=texthl.simple.pre=span+class%3Dhlhl.simple.post=%2Fspanhl.fragsize=0hl.mergeContiguous=falsedebugQuery=on
 
  and here is what I get on the debugQuery:
  lst name=debug
  −
  str name=rawquerystring
  opening excellent AND presentation_id:294 AND type:blob
  /str
  −
  str name=querystring
  opening excellent AND presentation_id:294 AND type:blob
  /str
  −
  str name=parsedquery
  all_text:open +all_text:excel +presentation_id:294 +type:blob
  /str
  −
  str name=parsedquery_toString
  all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob
  /str
  −
  lst name=explain
  −
  str name=1435675blob
 
  3.1143723 = (MATCH) sum of:
   0.46052343 = (MATCH) weight(all_text:open in 4457), product of:
 0.5531408 = queryWeight(all_text:open), product of:
   5.3283896 = idf(docFreq=162, maxDocs=12359)
   0.10381013 = queryNorm
 0.8325609 = (MATCH) fieldWeight(all_text:open in 4457), product of:
   1.0 = tf(termFreq(all_text:open)=1)
   5.3283896 = idf(docFreq=162, maxDocs=12359)
   0.15625 = fieldNorm(field=all_text, doc=4457)
   0.74662465 = (MATCH) weight(all_text:excel in 4457), product of:
 0.7043054 = queryWeight(all_text:excel), product of:
   6.7845535 = idf(docFreq=37, maxDocs=12359)
   0.10381013 = queryNorm
 1.0600865 = (MATCH) fieldWeight(all_text:excel in 4457), product of:
   1.0 = tf(termFreq(all_text:excel)=1)
   6.7845535 = idf(docFreq=37, maxDocs=12359)
   0.15625 = fieldNorm(field=all_text, doc=4457)
   1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4457), product of:
 0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
   4.1625586 = idf(docFreq=522, maxDocs=12359)
   0.10381013 = queryNorm
 4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4457),
 product
  of:
   1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
   4.1625586 = idf(docFreq=522, maxDocs=12359)
   1.0 = fieldNorm(field=presentation_id, doc=4457)
   0.108517066 = (MATCH) weight(type:blob in 4457), product of:
 0.10613751 = queryWeight(type:blob), product of:
   1.0224196 = idf(docFreq=12084, maxDocs=12359)
   0.10381013 = queryNorm
 1.0224196 = (MATCH) fieldWeight(type:blob in 4457), product of:
   1.0 = tf(termFreq(type:blob)=1)
   1.0224196 = idf(docFreq=12084, maxDocs=12359)
   1.0 = fieldNorm(field=type, doc=4457)
  /str
  −
  str name=1436129blob
 
  2.06395 = (MATCH) product of:
   2.7519336 = (MATCH) sum of:
 0.84470934 = (MATCH) weight(all_text:excel in 4911), product of:
   0.7043054 = queryWeight(all_text:excel), product of:
 6.7845535 = idf(docFreq=37, maxDocs=12359)
 0.10381013 = queryNorm
   1.199351 = (MATCH) fieldWeight(all_text:excel in 4911), product of:
 1.4142135 = tf(termFreq(all_text:excel)=2)
 6.7845535 = idf(docFreq=37, maxDocs=12359)
 0.125 = fieldNorm(field=all_text, doc=4911)
 1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4911), product of:
   0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
 4.1625586 = idf(docFreq=522, maxDocs=12359)
 0.10381013 = queryNorm
   4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4911),
  product of:
 1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
 4.1625586 = idf(docFreq=522, maxDocs=12359)
 1.0 = fieldNorm(field=presentation_id, doc=4911)
 0.108517066 = (MATCH) 

logging for solr

2010-09-20 Thread Christopher Gross
I'm running an old version of Solr (1.2) on Apache Tomcat 5.5.25.
Right now the logs all go to the catalina.out file, which has been
growing rather large.  I have to shut down the servers periodically to
clear out that logfile because it keeps getting large and giving disk
space warnings.

I've tried looking around for instructions on configuring the logging
for Solr, but I'm not having much luck.  Can someone please point me
in the right direction to set up the logging for Solr?  If I can get
it into rolling logfiles, I can just have a cron job take out the old
ones and not have to restart to do cleanup.

Please don't tell me to upgrade the software -- it is not an option at
this point.  I'm sure that the latest versions have it working better,
but right now I am unable to upgrade Solr or Tomcat to new versions.

Thanks!

-- Chris


Re: Searching solr with a two word query

2010-09-20 Thread noel
Say if I had a two word query that was opening excellent, I would like it to 
return something like:

opening excellent
opening
opening
opening
excellent
excellent
excellent

Instead of:
opening excellent
excellent
excellent
excellent

If I did a search, I would like the first word alone to also show up in the 
results, because currently my results show both words in one result and only 
the second word for the rest of the results. I've done a search on each word by 
itself, and there are results for them.

Thanks.

-Original Message-
From: Erick Erickson erickerick...@gmail.com
Sent: Monday, September 20, 2010 2:37pm
To: solr-user@lucene.apache.org
Subject: Re: Searching solr with a two word query

I'm missing what you really want out of your query, your
phrase either word as a single result just isn't connecting
in my grey matter.. Could you give some example inputs and
outputs that demonstrates what you want?

Best
Erick

On Mon, Sep 20, 2010 at 11:41 AM, n...@frameweld.com wrote:

 I noticed that my defaultOperator is OR, and that does have an effect on
 what does come up. If I were to change that to and, it's an exact match to
 my query, but Im would like similar matches with either word as a single
 result. Is there another value I can use? Or maybe I should use another
 query parser?

 Thanks.
 - Noel

 -Original Message-
 From: Erick Erickson erickerick...@gmail.com
 Sent: Monday, September 20, 2010 10:05am
 To: solr-user@lucene.apache.org
 Subject: Re: Searching solr with a two word query

 Here's an excellent description of the Lucene query operators and how they
 differ from strict
 boolean logic:
 http://www.gossamer-threads.com/lists/lucene/java-user/47928

 http://www.gossamer-threads.com/lists/lucene/java-user/47928But the
 short
 form is that (and boy, doesn't the fact that the URL escaping spaces
 as '+', which is also a Lucene operator make looking at these interesting),
 is that the
 first term is essentially a SHOULD clause in a Lucene BooleanQuery and is
 matching your docs all by itself.

 HTH
 Erick

 On Mon, Sep 20, 2010 at 8:58 AM, n...@frameweld.com wrote:

  Here is my raw query:
 
 q=opening+excellent+AND+presentation_id%3A294+AND+type%3Ablobversion=1.3
  json.nl
 
 =maprows=10start=0wt=xmlhl=truehl.fl=texthl.simple.pre=span+class%3Dhlhl.simple.post=%2Fspanhl.fragsize=0hl.mergeContiguous=falsedebugQuery=on
 
  and here is what I get on the debugQuery:
  lst name=debug
  −
  str name=rawquerystring
  opening excellent AND presentation_id:294 AND type:blob
  /str
  −
  str name=querystring
  opening excellent AND presentation_id:294 AND type:blob
  /str
  −
  str name=parsedquery
  all_text:open +all_text:excel +presentation_id:294 +type:blob
  /str
  −
  str name=parsedquery_toString
  all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob
  /str
  −
  lst name=explain
  −
  str name=1435675blob
 
  3.1143723 = (MATCH) sum of:
   0.46052343 = (MATCH) weight(all_text:open in 4457), product of:
 0.5531408 = queryWeight(all_text:open), product of:
   5.3283896 = idf(docFreq=162, maxDocs=12359)
   0.10381013 = queryNorm
 0.8325609 = (MATCH) fieldWeight(all_text:open in 4457), product of:
   1.0 = tf(termFreq(all_text:open)=1)
   5.3283896 = idf(docFreq=162, maxDocs=12359)
   0.15625 = fieldNorm(field=all_text, doc=4457)
   0.74662465 = (MATCH) weight(all_text:excel in 4457), product of:
 0.7043054 = queryWeight(all_text:excel), product of:
   6.7845535 = idf(docFreq=37, maxDocs=12359)
   0.10381013 = queryNorm
 1.0600865 = (MATCH) fieldWeight(all_text:excel in 4457), product of:
   1.0 = tf(termFreq(all_text:excel)=1)
   6.7845535 = idf(docFreq=37, maxDocs=12359)
   0.15625 = fieldNorm(field=all_text, doc=4457)
   1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4457), product of:
 0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
   4.1625586 = idf(docFreq=522, maxDocs=12359)
   0.10381013 = queryNorm
 4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4457),
 product
  of:
   1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
   4.1625586 = idf(docFreq=522, maxDocs=12359)
   1.0 = fieldNorm(field=presentation_id, doc=4457)
   0.108517066 = (MATCH) weight(type:blob in 4457), product of:
 0.10613751 = queryWeight(type:blob), product of:
   1.0224196 = idf(docFreq=12084, maxDocs=12359)
   0.10381013 = queryNorm
 1.0224196 = (MATCH) fieldWeight(type:blob in 4457), product of:
   1.0 = tf(termFreq(type:blob)=1)
   1.0224196 = idf(docFreq=12084, maxDocs=12359)
   1.0 = fieldNorm(field=type, doc=4457)
  /str
  −
  str name=1436129blob
 
  2.06395 = (MATCH) product of:
   2.7519336 = (MATCH) sum of:
 0.84470934 = (MATCH) weight(all_text:excel in 4911), product of:
   0.7043054 = queryWeight(all_text:excel), product of:
 6.7845535 = idf(docFreq=37, maxDocs=12359)
 0.10381013 = queryNorm
   1.199351 = (MATCH) 

Re: logging for solr

2010-09-20 Thread Jak Akdemir
It is quite easy to modify its default value. Solr is using default
logging values that started to use in jvm. It can be bound as a start
parameter or can be externally defined in
../tomcat/conf/logging.properties.

Simply it is enough to remove all contents (backup first) in
../tomcat/conf/logging.properties and write .level = SEVERE

This change will make root checkbox from unset to severe. Of course
you can switch it to WARNING or INFO too.

You can observe changes from http://localhost:8080/solr/admin/logging
or simply ~/admin/logging  pages.

Details are here:

http://wiki.apache.org/tomcat/Logging_Tutorial

http://tomcat.apache.org/tomcat-6.0-doc/logging.html

Jak

On Mon, Sep 20, 2010 at 10:32 PM, Christopher Gross cogr...@gmail.com wrote:

 I'm running an old version of Solr (1.2) on Apache Tomcat 5.5.25.
 Right now the logs all go to the catalina.out file, which has been
 growing rather large.  I have to shut down the servers periodically to
 clear out that logfile because it keeps getting large and giving disk
 space warnings.

 I've tried looking around for instructions on configuring the logging
 for Solr, but I'm not having much luck.  Can someone please point me
 in the right direction to set up the logging for Solr?  If I can get
 it into rolling logfiles, I can just have a cron job take out the old
 ones and not have to restart to do cleanup.

 Please don't tell me to upgrade the software -- it is not an option at
 this point.  I'm sure that the latest versions have it working better,
 but right now I am unable to upgrade Solr or Tomcat to new versions.

 Thanks!

 -- Chris


Re: logging for solr

2010-09-20 Thread Christopher Gross
Thanks Jak!  That was just what I was looking for!

-- Chris



On Mon, Sep 20, 2010 at 4:25 PM, Jak Akdemir jakde...@gmail.com wrote:
 It is quite easy to modify its default value. Solr is using default
 logging values that started to use in jvm. It can be bound as a start
 parameter or can be externally defined in
 ../tomcat/conf/logging.properties.

 Simply it is enough to remove all contents (backup first) in
 ../tomcat/conf/logging.properties and write .level = SEVERE

 This change will make root checkbox from unset to severe. Of course
 you can switch it to WARNING or INFO too.

 You can observe changes from http://localhost:8080/solr/admin/logging
 or simply ~/admin/logging  pages.

 Details are here:

 http://wiki.apache.org/tomcat/Logging_Tutorial

 http://tomcat.apache.org/tomcat-6.0-doc/logging.html

 Jak

 On Mon, Sep 20, 2010 at 10:32 PM, Christopher Gross cogr...@gmail.com wrote:

 I'm running an old version of Solr (1.2) on Apache Tomcat 5.5.25.
 Right now the logs all go to the catalina.out file, which has been
 growing rather large.  I have to shut down the servers periodically to
 clear out that logfile because it keeps getting large and giving disk
 space warnings.

 I've tried looking around for instructions on configuring the logging
 for Solr, but I'm not having much luck.  Can someone please point me
 in the right direction to set up the logging for Solr?  If I can get
 it into rolling logfiles, I can just have a cron job take out the old
 ones and not have to restart to do cleanup.

 Please don't tell me to upgrade the software -- it is not an option at
 this point.  I'm sure that the latest versions have it working better,
 but right now I am unable to upgrade Solr or Tomcat to new versions.

 Thanks!

 -- Chris



Re: Searching solr with a two word query

2010-09-20 Thread Tom Hill
It will probably be clearer if you don't use the pseudo-boolean
operators, and just use + for required terms.

If you look at your output from debug, you see your query becomes:

    all_text:open +all_text:excel +presentation_id:294 +type:blob

Note that all_text:open does not have a + sign, but
all_text:excel has one. So all_text:open is not required, but
all_text:excel is.

I think this is because AND marks both of its operands as required.
(which puts the + on +all_text:excel), but the open has no explicit
op, so it uses OR, which marks that term as optional.

What I would suggest you do is:

   opening excellent +presentation_id:294 +type:blob

Which is think is much clearer.

I think you could also do
  opening excellent presentation_id:294 AND type:blob
but I think it's  non-obvious how the result will differ from
  opening excellent AND presentation_id:294 AND type:blob
So I wouldn't use either of the last two.


Tom
p.s. Not sure what is going on with the last lines of your debug
output for the query. Is that really what shows up after presentation
ID? I see Euro, hash mark, zero, semi-colon, and H with stroke

str name=parsedquery_toString
all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob
/str

On Mon, Sep 20, 2010 at 12:46 PM, n...@frameweld.com wrote:

 Say if I had a two word query that was opening excellent, I would like it 
 to return something like:

 opening excellent
 opening
 opening
 opening
 excellent
 excellent
 excellent

 Instead of:
 opening excellent
 excellent
 excellent
 excellent

 If I did a search, I would like the first word alone to also show up in the 
 results, because currently my results show both words in one result and only 
 the second word for the rest of the results. I've done a search on each word 
 by itself, and there are results for them.

 Thanks.

 -Original Message-
 From: Erick Erickson erickerick...@gmail.com
 Sent: Monday, September 20, 2010 2:37pm
 To: solr-user@lucene.apache.org
 Subject: Re: Searching solr with a two word query

 I'm missing what you really want out of your query, your
 phrase either word as a single result just isn't connecting
 in my grey matter.. Could you give some example inputs and
 outputs that demonstrates what you want?

 Best
 Erick

 On Mon, Sep 20, 2010 at 11:41 AM, n...@frameweld.com wrote:

  I noticed that my defaultOperator is OR, and that does have an effect on
  what does come up. If I were to change that to and, it's an exact match to
  my query, but Im would like similar matches with either word as a single
  result. Is there another value I can use? Or maybe I should use another
  query parser?
 
  Thanks.
  - Noel
 
  -Original Message-
  From: Erick Erickson erickerick...@gmail.com
  Sent: Monday, September 20, 2010 10:05am
  To: solr-user@lucene.apache.org
  Subject: Re: Searching solr with a two word query
 
  Here's an excellent description of the Lucene query operators and how they
  differ from strict
  boolean logic:
  http://www.gossamer-threads.com/lists/lucene/java-user/47928
 
  http://www.gossamer-threads.com/lists/lucene/java-user/47928But the
  short
  form is that (and boy, doesn't the fact that the URL escaping spaces
  as '+', which is also a Lucene operator make looking at these interesting),
  is that the
  first term is essentially a SHOULD clause in a Lucene BooleanQuery and is
  matching your docs all by itself.
 
  HTH
  Erick
 
  On Mon, Sep 20, 2010 at 8:58 AM, n...@frameweld.com wrote:
 
   Here is my raw query:
  
  q=opening+excellent+AND+presentation_id%3A294+AND+type%3Ablobversion=1.3
   json.nl
  
  =maprows=10start=0wt=xmlhl=truehl.fl=texthl.simple.pre=span+class%3Dhlhl.simple.post=%2Fspanhl.fragsize=0hl.mergeContiguous=falsedebugQuery=on
  
   and here is what I get on the debugQuery:
   lst name=debug
   −
   str name=rawquerystring
   opening excellent AND presentation_id:294 AND type:blob
   /str
   −
   str name=querystring
   opening excellent AND presentation_id:294 AND type:blob
   /str
   −
   str name=parsedquery
   all_text:open +all_text:excel +presentation_id:294 +type:blob
   /str
   −
   str name=parsedquery_toString
   all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob
   /str
   −
   lst name=explain
   −
   str name=1435675blob
  
   3.1143723 = (MATCH) sum of:
    0.46052343 = (MATCH) weight(all_text:open in 4457), product of:
      0.5531408 = queryWeight(all_text:open), product of:
        5.3283896 = idf(docFreq=162, maxDocs=12359)
        0.10381013 = queryNorm
      0.8325609 = (MATCH) fieldWeight(all_text:open in 4457), product of:
        1.0 = tf(termFreq(all_text:open)=1)
        5.3283896 = idf(docFreq=162, maxDocs=12359)
        0.15625 = fieldNorm(field=all_text, doc=4457)
    0.74662465 = (MATCH) weight(all_text:excel in 4457), product of:
      0.7043054 = queryWeight(all_text:excel), product of:
        6.7845535 = idf(docFreq=37, maxDocs=12359)
        0.10381013 = queryNorm
      1.0600865 = (MATCH) 

RE: Re: Calculating distances in Solr using longitude latitude

2010-09-20 Thread Dennis Gearon
You know, if there were some sort of hexagonal/pentagonal, soccer ball 
coordinate system for the Earth, all you'd need is an entry's distance to each 
of the 6/5 facets of the cell it was in, the distance between any two facets, 
and the distance to the endpoint to all it's facets. A giant table of 
precomputed distances, or some numbering system of coordinates that 
automatically gave the two facets and the distance between the faces would be 
even better.

Then just look up the distances and add them.

Still waiting for the coordinate system though :-). If one could get it to 10 
meters resolution, wow.




Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/20/10, Markus Jelsma markus.jel...@buyways.nl wrote:

 From: Markus Jelsma markus.jel...@buyways.nl
 Subject: RE: Re: Calculating distances in Solr using longitude latitude
 To: solr-user@lucene.apache.org
 Date: Monday, September 20, 2010, 1:00 PM
 Hi,
 
  
 
 In the early Solr 1.3 times we had an index with
 leisure-time objects that included geographical coordinates.
 Based on certain conditions we had to display a specific
 list of nearby objects. We simply implemented some Great
 Circle calculations such as the distance between points [1]
 and aggregated nearby objects and sent then to our index.
 The drawback is that for each addition to the index, you'd
 have to recalculate all other nearby objects, that takes a
 while. The good thing is, in production, the system isn't
 slowed down by these calculations so it's very fast.
 
  
 
 [1]: http://williams.best.vwh.net/avform.htm#Dist

 
  
 
 Cheers,
  
 -Original message-
 From: Dennis Gearon gear...@sbcglobal.net
 Sent: Mon 20-09-2010 19:42
 To: solr-user@lucene.apache.org;
 
 Subject: Re: Calculating distances in Solr using longitude
 latitude
 
 Hmmm,
     I am about to put a engineer on our search engine
 requirements with the assumption that latitude/longitude is
 available in the current release of Solr, (not knowing what
 that is). 
 
     I have been partitioning the whole Solr thing to
 him,except enough info for me to understand and interface to
 his work. So, I don't have that answer.
 
     Can someone else answer him?
 
 
 Dennis Gearon
 
 Signature Warning
 
 EARTH has a Right To Life,
  otherwise we all die.
 
 Read 'Hot, Flat, and Crowded'
 Laugh at http://www.yert.com/film.php

 
 
 --- On Mon, 9/20/10, PeterKerk vettepa...@hotmail.com
 wrote:
 
  From: PeterKerk vettepa...@hotmail.com
  Subject: Re: Calculating distances in Solr using
 longitude latitude
  To: solr-user@lucene.apache.org
  Date: Monday, September 20, 2010, 6:53 AM
  
  Hi Dennis,
  
  Good suggestion, but I see that most of that is Solr
 4.0
  functionality,
  which has not been released yet.
  How can I still use the longitude latitude
 functionality
  (LatLonType)?
  
  Thanks!
  -- 
  View this message in context: 
  http://lucene.472066.n3.nabble.com/Calculating-distances-in-Solr-using-longitude-latitude-tp1524297p1529097.html

  Sent from the Solr - User mailing list archive at
  Nabble.com.
  



Re: Calculating distances in Solr using longitude latitude

2010-09-20 Thread Lance Norskog
There is a third-party add-on for Solr 1.4 called LocalSolr. It has a 
different API than the upcoming SpatialSearch stuff, and will probably 
not live on in future releases.


The LatLonType stuff is definitely only on the trunk, not even 3.x.

PeterKerk wrote:

Hi Dennis,

Good suggestion, but I see that most of that is Solr 4.0 functionality,
which has not been released yet.
How can I still use the longitude latitude functionality (LatLonType)?

Thanks!
   


Re: Solr for statistical data

2010-09-20 Thread Lance Norskog

Does this do what you want?

http://wiki.apache.org/solr/StatsComponent

I can see that group by is a possible enhancement to this component.

Kjetil Ødegaard wrote:

Hi all,


we're currently using Solr 1.4.0 in a project for statistical data, where we
group and sum a number of double values. Probably not what most people use
Solr for, but it seems to be working fine for us :-)


We do have some challenges, especially with memory use, so I thought I'd
check here if anybody has done something similar.


Some details:


- The index is currently around 30 GB and growing. The data is indexed
directly from a database, each row ends up as a document. I think we have
around 100 million documents now, the largest core is about 40 million. The
data is split in different cores for different statistics data.


- Heap size is currently 4 GB. We're currently running all the cores in a
single JVM on WebSphere (WAS) 6.1. We have a couple of GB left for OS disk
cache. Initially we used a 1 GB heap, so we had to split cores in different
shards in order to avoid OutOfMemoryErrors because of the FieldCache (I
think).


- The grouping is done by a custom Solr component which takes parameters
that specify which fields to group by (like in SQL) and sums up values for
the group. This uses the FieldCache for speedy retrieval. We did a PoC on
using Documents instead, but this seemed to go a lot slower. I've done a
memory dump and the combined FieldCache looks to be about 3 GB (taken with a
grain of salt since I'm not sure all the data was cached).


I guess this is different from normal Solr searches since we have to process
all the documents in a core in order to calculate results, we can't just
return the first 10 (or whatever) documents.


Any tips or similar experiences?



---Kjetil

   


Re: Solr Analyzer results before the actual query.

2010-09-20 Thread Lance Norskog
Yes. Look at the jsp page solr/admin/analysis.jsp . This does calls to 
Solr which do exactly what you want. They use the AnalysisComponent.


Lance

zackko wrote:

Hi to all the Forum from a new subscriber,

I’m working on the Server Side Search solution of the Company when I’m
currently employed with. I have a problem at the moment: When I will submit
a search to Solr I want to see the “Analyzer results”, with all the Filter
applied to it as defined into the types.xml, of the search terms (Query)
submitted to the Analyzer itself. The result of the Analyzer I want to have
displayed BEFORE the actual search will be performed so I can decide at this
point if I can run the proper search or leave the user with no results on
the search performed.

The problem is more less described in that issue
https://issues.apache.org/jira/browse/SOLR-261. In summary is that possible
to have the Analyzer results (in code) before running the actual Sorl
search?

I'm quite new to Solr so maybe this issue has been already discussed in
another thread but I'm unable to find it at the moment, so if anybody has a
any clue on how to do that please any suggestion will be more than welcome.

Thanks very much in advance for your answer.

Best wishes.