Re: Spellcheck: java.lang.RuntimeException: java.io.IOException: read past EOF

2009-11-23 Thread Shalin Shekhar Mangar
On Tue, Nov 24, 2009 at 1:14 AM, ranjitr wrote:

>
> Hello,
>
> Solr 1.3 reported the following error when our app tried to query it:
>
> java.lang.RuntimeException: java.io.IOException: read past EOF
>at
>
> org.apache.solr.spelling.IndexBasedSpellChecker.build(IndexBasedSpellChecker.java:91)
>at
>
> org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:108)
> .
>
>
Can you post the complete stack trace (i.e. the underlying exception's stack
trace as well)?

When this error occured, our solrconfig.xml had spellcheck.build set to
> true. This was a configuration eror on our part. I was wondering if the
> spellcheck index being re-built for each query could have caused the above
> exception to occur.
>
>
I don't know. Rebuilding the index for each query is not a good idea
anyways.

-- 
Regards,
Shalin Shekhar Mangar.


Re: schema-based Index-time field boosting

2009-11-23 Thread Michael Lackhoff
On 23.11.2009 19:33 Chris Hostetter wrote:

> ...if there was a way to oost fields at index time that was configured in 
> the schema.xml, then every doc would get that boost on it's instances of 
> those fields but the only purpose of index time boosting is to indicate 
> that one document is more significant then another doc -- if every doc 
> gets the same boost, it becomes a No-OP.
> 
> (think about the math -- field boosts become multipliers in the fieldNorm 
> -- if every doc gets the same multiplier, then there is no net effect)

Coming in a bit late but I would like a variant that is not a No-OP.
Think of something like title:searchstring^10 OR catch_all:searchstring
Of course I can always add the boosting at query time but it would make
life easier if I could define a default boost in the schema so that my
query could just be title:searchstring OR catch_all:searchstring
but still get the boost for the title field.

Thinking this further it would be even better if it was possible to
define one (or more) fallback field(s) with associated boost factor in
the schema. Then it would be enough to query for title:searchstring and
it would be automatically expanded to e.g.
title:searchstring^10 OR title_other_language:searchstring^5 OR
catchall:searchstring
or whatever you define in the schema.

-Michael




Re: Implementing phrase autopop up

2009-11-23 Thread Shalin Shekhar Mangar
On Tue, Nov 24, 2009 at 10:12 AM, darniz  wrote:

>
> hello all
> Let me first explain the task i am trying to do.
> i have article with title for example
> 
> >Car Insurance for Teenage Drivers
> 
> −
> 
> A Total Loss? 
> 
> If a user begins to type car insu i want the autopop to show up with the
> entire phrase.
> There are two ways to implement this.
> First is to use the termcomponent and the other is to use a field with
> field
> type which uses solr.EdgeNGramFilterFactor filter.
>
> I started with using with Term component and i declared a term request
> handler and gave the following query
>
> http://localhost:8080/solr/terms?terms.fl=title&terms.prefix=car
> The issue is that its not giving the entire pharse, it gives me back
> results
> like car, caravan, carbon. Now  i know using terms.prefix will only give me
> results where the sentence start with car. On top of this i also want if
> there is word like car somewhere in between the title that should also show
> up in autopop very much similar like google where a word is not necessarily
> start at the beginning but it could be present anywhere in the middle of
> the
> title.
> The question is does TermComponent is a good candidate or  using a custom
> field lets the name is autoPopupText with field type configured with all
> filter and EdgeNGramFilterFactor defined and copying the title to the
> autoPopupText field and using it to power autopopup.
>
> The other thing is that using  EdgeNGramFilterFactor is more from index
> point of view when you index document you need to know which fields you
> want
> to copy to autoPopupText field where as using Term component is more like
> you can define at query time what fields you want to use to fetch
> autocomplete from.
>
> Any idea whats the best and why the Term component is not giving me an
> entire phrase which i mentioned earlier.
> FYI
> my title field is of type text.
>


You are using a tokenized field type with TermsComponent therefore each word
in your phrase gets indexed as a separate token. You should use a
non-tokenized type (such as a string type) with TermsComponent. However,
this will only let you search by prefix and not by words in between the
phrase.

Your best bet here would be to use EdgeNGramFilterFactory. If your index is
very large, you can consider doing a prefix search on shingles too.

-- 
Regards,
Shalin Shekhar Mangar.


Re: auto-completion preview?

2009-11-23 Thread Shalin Shekhar Mangar
On Tue, Nov 24, 2009 at 10:39 AM, Paul Libbrecht wrote:

>
> Hello Solr users,
>
> is there a live demo of the auto-completion feature somewhere?
>
> thanks in advance
>

Well, there is no preview but I can give you a couple of live instances:

   1. http://autos.aol.com/
   2. http://travel.aol.com/

Try typing something into the top most search box.

-- 
Regards,
Shalin Shekhar Mangar.


Re: help with dataimport delta query

2009-11-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
I guess the field names do not match
in the deltaQuery you are selecting the field id

and in the deltaImportQuery you us the field as
${dataimporter.delta.job_jobs_id}
I guess it should be ${dataimporter.delta.id}

On Tue, Nov 24, 2009 at 1:19 AM, Joel Nylund  wrote:
> Hi, I have solr all working nicely, except im trying to get deltas to work
> on my data import handler
>
> Here is a simplification of my data import config, I have a table called
> "Book" which has categories, im doing subquries for the category info and
> calling a javascript helper. This all works perfectly for the regular query.
>
> I added these lines for the delta stuff:
>
>        deltaImportQuery="SELECT f.id,f.title
>                        FROM Book f
>                        f.id='${dataimporter.delta.job_jobs_id}'"
>                deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND
> lastModifiedDate > '${dataimporter.last_index_time}'"  >
>
> basically im trying to rows that lastModifiedDate is newer than the last
> index (or deltaindex).
>
> I run:
> http://localhost:8983/solr/dataimport?command=delta-import
>
> And it says in logs:
>
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DataImporter
> doDeltaImport
> INFO: Starting Delta Import
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> doDelta
> INFO: Starting delta collection.
> Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=delta-import}
> status=0 QTime=0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running ModifiedRowKey() for Entity: category
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: category rows obtained : 0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: category rows obtained : 0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: category
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running ModifiedRowKey() for Entity: item
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: item rows obtained : 0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: item rows obtained : 0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: item
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> doDelta
> INFO: Delta Import completed successfully
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> execute
> INFO: Time taken = 0:0:0.21
>
> But the browser says no documents added/modified (even though one record in
> db is a match)
>
> Is there a way to turn debugging so I can see the queries the DIH is sending
> to the db?
>
> Any other ideas of what I could be doing wrong?
>
> thanks
> Joel
>
>
> 
>          query="SELECT f.id, f.title
>                FROM Book f
>                WHERE f.inMyList=1"
>                deltaImportQuery="SELECT f.id,f.title
>                        FROM Book f
>                        f.id='${dataimporter.delta.job_jobs_id}'"
>                deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND
> lastModifiedDate > '${dataimporter.last_index_time}'"  >
>
>           
>           
>                 transformer="script:SplitAndPrettyCategory" query="select fc.bookId,
> group_concat(cr.name) as categoryName,
>                 from BookCat fc
>                 where fc.bookId = '${item.id}' AND
>                 group by fc.bookId">
>                 
>                 
>    
>   
>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


auto-completion preview?

2009-11-23 Thread Paul Libbrecht


Hello Solr users,

is there a live demo of the auto-completion feature somewhere?

thanks in advance

paul


Implementing phrase autopop up

2009-11-23 Thread darniz

hello all
Let me first explain the task i am trying to do.
i have article with title for example

>Car Insurance for Teenage Drivers

−

A Total Loss? 

If a user begins to type car insu i want the autopop to show up with the
entire phrase.
There are two ways to implement this.
First is to use the termcomponent and the other is to use a field with field
type which uses solr.EdgeNGramFilterFactor filter.

I started with using with Term component and i declared a term request
handler and gave the following query

http://localhost:8080/solr/terms?terms.fl=title&terms.prefix=car
The issue is that its not giving the entire pharse, it gives me back results
like car, caravan, carbon. Now  i know using terms.prefix will only give me
results where the sentence start with car. On top of this i also want if
there is word like car somewhere in between the title that should also show
up in autopop very much similar like google where a word is not necessarily
start at the beginning but it could be present anywhere in the middle of the
title.
The question is does TermComponent is a good candidate or  using a custom
field lets the name is autoPopupText with field type configured with all
filter and EdgeNGramFilterFactor defined and copying the title to the
autoPopupText field and using it to power autopopup.

The other thing is that using  EdgeNGramFilterFactor is more from index
point of view when you index document you need to know which fields you want
to copy to autoPopupText field where as using Term component is more like
you can define at query time what fields you want to use to fetch
autocomplete from.

Any idea whats the best and why the Term component is not giving me an
entire phrase which i mentioned earlier.
FYI
my title field is of type text.
Thanks
darniz

-- 
View this message in context: 
http://old.nabble.com/Implementing-phrase-autopop-up-tp26490419p26490419.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-23 Thread Grant Ingersoll

On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:

> 
> *:* returns me 1 count but when I search for specific word (which was part of
> .txt file I indexed before) it doesn't return me anything. I don't have luke
> setup on my end.

http://localhost:8983/solr/admin/luke should give yo some info.


> let me see if I can set that up quickly but otherwise do
> you see anything I am missing in solrconfig mapping or something?

What's your schema look like and how are you querying?

> which maps
> document "content" to wrong attribute?
> 
> thanks,
> 
> Grant Ingersoll-6 wrote:
>> 
>> 
>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>> 
>>> 
>>> Following code is from my test case where it tries to index a file (of
>>> type
>>> .txt)
>>> ContentStreamUpdateRequest up = new
>>> ContentStreamUpdateRequest("/update/extract");
>>> up.addFile(fileToIndex);
>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>> up.setParam("ext.literal.docName", "doc123.txt");
>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);  
>>> server.request(up); 
>>> 
>>> test case doesn't give me any error and "I think" its indexing the file?
>>> but
>>> when I search for a text (which was part of the .txt file) search doesn't
>>> return me anything.
>> 
>> What do your logs show?  Else, what does Luke show or doing a *:* query
>> (assuming this is the only file you added)?
>> 
>> Also, I don't think you need ext.literal anymore, just literal.
>> 
>>> 
>>> Following is the config from solrconfig.xml where I have mapped content
>>> to
>>> "description" field(default search field) in the schema.
>>> 
>>> >> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>   
>>> description
>>> description
>>>   
>>> 
>>> 
>>> Clearly it seems I am missing something. Any idea?
>> 
>> 
>> 
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 
>> 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search



Re: help with dataimport delta query

2009-11-23 Thread Joel Nylund
got to love it when yahoo thinks your own mail is spam, anyone have  
any ideas how to get logging to work with 1.4.


I went to the admin panel and set all logging to finest.

In my jetty std out I see no SQL for any of the dataimport handler  
run. I see


Nov 23, 2009 9:26:27 PM  
org.apache.solr.handler.dataimport.JdbcDataSource$1 call

INFO: Time taken for getConnection(): 6
Nov 23, 2009 9:26:32 PM  
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity category with URL: jdbc:mysql:// 
localhost/feeddb
Nov 23, 2009 9:26:32 PM  
org.apache.solr.handler.dataimport.JdbcDataSource$1 call

INFO: Time taken for getConnection(): 5


But no sql, from looking at the source, it looks like it should be  
logging the sql if Im in debug mode.


any ideas, I think I am losing my mind.

my full import works, but the delta does nothing

thanks
Joel



On Nov 23, 2009, at 2:49 PM, Joel Nylund wrote:

Hi, I have solr all working nicely, except im trying to get deltas  
to work on my data import handler


Here is a simplification of my data import config, I have a table  
called "Book" which has categories, im doing subquries for the  
category info and calling a javascript helper. This all works  
perfectly for the regular query.


I added these lines for the delta stuff:

deltaImportQuery="SELECT f.id,f.title
FROM Book f
f.id='${dataimporter.delta.job_jobs_id}'"
		deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
lastModifiedDate > '${dataimporter.last_index_time}'"  >


basically im trying to rows that lastModifiedDate is newer than the  
last index (or deltaindex).


I run:
http://localhost:8983/solr/dataimport?command=delta-import

And it says in logs:

Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DataImporter doDeltaImport

INFO: Starting Delta Import
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties

INFO: Read dataimport.properties
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder doDelta

INFO: Starting delta collection.
Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=delta-import}  
status=0 QTime=0
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Running ModifiedRowKey() for Entity: category
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Completed ModifiedRowKey for Entity: category rows obtained : 0
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Completed DeletedRowKey for Entity: category rows obtained : 0
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Completed parentDeltaQuery for Entity: category
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Running ModifiedRowKey() for Entity: item
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Completed ModifiedRowKey for Entity: item rows obtained : 0
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Completed DeletedRowKey for Entity: item rows obtained : 0
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Completed parentDeltaQuery for Entity: item
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder doDelta

INFO: Delta Import completed successfully
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder execute

INFO: Time taken = 0:0:0.21

But the browser says no documents added/modified (even though one  
record in db is a match)


Is there a way to turn debugging so I can see the queries the DIH is  
sending to the db?


Any other ideas of what I could be doing wrong?

thanks
Joel



   		deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
lastModifiedDate > '${dataimporter.last_index_time}'"  >


  
  
		transformer="script:SplitAndPrettyCategory" query="select fc.bookId,  
group_concat(cr.name) as categoryName,

 from BookCat fc
 where fc.bookId = '${item.id}' AND
 group by fc.bookId">
 
 
   
  






Webinar: An Introduction to Basics of Search and Relevancy with Apache Solr hosted by Lucid Imagination

2009-11-23 Thread Tom Hill
In this introductory technical presentation, renowned search expert Mark
Bennett, CTO of Search Consultancy New Idea Engineering,

will present practical tips and examples to help you quickly get productive
with Solr, including:

* Working with the "web command line" and controlling your inputs and
outputs
* Understanding the DISMAX parser
* Using the Explain output to tune your results relevance
* Using the Schema browser

Wednesday, December 2, 2009
11:00am PST / 2:00pm EST

Click here to sign up:
http://www.eventsvc.com/lucidimagination/120209?trk=WR-DEC2009-AP


Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-23 Thread javaxmlsoapdev

FYI: weirdly its returning me following when I run
rsp.getResults().get(0).getFieldValue("description")

[702, text/plain, doc123.txt, ]

so it seems like its storing 

up.setParam("ext.literal.docName", "doc123.txt"); into description versus
file content in "description" attribute.

Any idea?

Thanks,

javaxmlsoapdev wrote:
> 
> *:* returns me 1 count but when I search for specific word (which was part
> of .txt file I indexed before) it doesn't return me anything. I don't have
> luke setup on my end. let me see if I can set that up quickly but
> otherwise do you see anything I am missing in solrconfig mapping or
> something? which maps document "content" to wrong attribute?
> 
> thanks,
> 
> Grant Ingersoll-6 wrote:
>> 
>> 
>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>> 
>>> 
>>> Following code is from my test case where it tries to index a file (of
>>> type
>>> .txt)
>>> ContentStreamUpdateRequest up = new
>>> ContentStreamUpdateRequest("/update/extract");
>>> up.addFile(fileToIndex);
>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>> up.setParam("ext.literal.docName", "doc123.txt");
>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);  
>>> server.request(up); 
>>> 
>>> test case doesn't give me any error and "I think" its indexing the file?
>>> but
>>> when I search for a text (which was part of the .txt file) search
>>> doesn't
>>> return me anything.
>> 
>> What do your logs show?  Else, what does Luke show or doing a *:* query
>> (assuming this is the only file you added)?
>> 
>> Also, I don't think you need ext.literal anymore, just literal.
>> 
>>> 
>>> Following is the config from solrconfig.xml where I have mapped content
>>> to
>>> "description" field(default search field) in the schema.
>>> 
>>> >> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>
>>>  description
>>>  description
>>>
>>>  
>>> 
>>> Clearly it seems I am missing something. Any idea?
>> 
>> 
>> 
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487409.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-23 Thread javaxmlsoapdev

*:* returns me 1 count but when I search for specific word (which was part of
.txt file I indexed before) it doesn't return me anything. I don't have luke
setup on my end. let me see if I can set that up quickly but otherwise do
you see anything I am missing in solrconfig mapping or something? which maps
document "content" to wrong attribute?

thanks,

Grant Ingersoll-6 wrote:
> 
> 
> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
> 
>> 
>> Following code is from my test case where it tries to index a file (of
>> type
>> .txt)
>> ContentStreamUpdateRequest up = new
>> ContentStreamUpdateRequest("/update/extract");
>> up.addFile(fileToIndex);
>> up.setParam("literal.key", "8978"); //key is the uniqueId
>> up.setParam("ext.literal.docName", "doc123.txt");
>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);   
>> server.request(up);  
>> 
>> test case doesn't give me any error and "I think" its indexing the file?
>> but
>> when I search for a text (which was part of the .txt file) search doesn't
>> return me anything.
> 
> What do your logs show?  Else, what does Luke show or doing a *:* query
> (assuming this is the only file you added)?
> 
> Also, I don't think you need ext.literal anymore, just literal.
> 
>> 
>> Following is the config from solrconfig.xml where I have mapped content
>> to
>> "description" field(default search field) in the schema.
>> 
>> > class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>
>>  description
>>  description
>>
>>  
>> 
>> Clearly it seems I am missing something. Any idea?
> 
> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Announcing the Apache Solr extension in PHP - 0.9.0

2009-11-23 Thread Michael Lugassy
Sweeet! you guys rock.

On Mon, Nov 23, 2009 at 11:12 PM, Thanh Doan  wrote:
> Thanks Israel
>
> I plan to try it and compare with rsolr
>
>
> On Nov 23, 2009, at 2:28 PM, Michael Lugassy  wrote:
>
>> Thanks Israel, exactly what I was looking for, but how would one get a
>> pre-compiled dll for windows? using PHP 5.3 VS9 TS.
>>
>> On Mon, Oct 5, 2009 at 7:03 AM, Israel Ekpo  wrote:
>>>
>>> Fellow Apache Solr users,
>>>
>>> I have been working on a PHP extension for Apache Solr in C for quite
>>> sometime now.
>>>
>>> I just finished testing it and I have completed the initial user level
>>> documentation of the API
>>>
>>> Version 0.9.0-beta has just been released.
>>>
>>> It already has built-in readiness for Solr 1.4
>>>
>>> If you are using Solr 1.3 or later in PHP, I would appreciate if you
>>> could
>>> check it out and give me some feedback.
>>>
>>> It is very easy to install on UNIX systems. I am still working on the
>>> build
>>> for windows. It should be available for Windows soon.
>>>
>>> http://solr.israelekpo.com/manual/en/solr.installation.php
>>>
>>> A quick list of some of the features of the API include :
>>> - Built in serialization of Solr Parameter objects.
>>> - Reuse of HTTP connections across repeated requests.
>>> - Ability to obtain input documents for possible resubmission from query
>>> responses.
>>> - Simplified interface to access server response data (SolrObject)
>>> - Ability to connect to Solr server instances secured behind HTTP
>>> Authentication and proxy servers
>>>
>>> The following components are also supported
>>> - Facets
>>> - MoreLikeThis
>>> - TermsComponent
>>> - Stats
>>> - Highlighting
>>>
>>> Solr PECL Extension Homepage
>>> http://pecl.php.net/package/solr
>>>
>>> Some examples are available here
>>> http://solr.israelekpo.com/manual/en/solr.examples.php
>>>
>>> Interim Documentation Page until refresh of official PHP documentation
>>> http://solr.israelekpo.com/manual/en/book.solr.php
>>>
>>> The C source is available here
>>> http://svn.php.net/viewvc/pecl/solr/
>>>
>>> --
>>> "Good Enough" is not good enough.
>>> To give anything less than your best is to sacrifice the gift.
>>> Quality First. Measure Twice. Cut Once.
>>>
>



-- 
Sent from my mobile


Re: Oddness with Phrase Query

2009-11-23 Thread Simon Wistow
On Mon, Nov 23, 2009 at 12:10:42PM -0800, Chris Hostetter said:
> ...hmm, you shouldn't have to reindex everything.  arey ou sure you 
> restarted solr after making the enablePositionIncrements="true" change to 
> the query analyzer?

Yup - definitely restarted
 
> what do the offsets look like when you go to analysis.jsp and past in that 
> sentence?

org.apache.solr.analysis.StopFilterFactory 
{words=stopwords.txt, ignoreCase=true, enablePositionIncrements=true}

term position:  1   4 
term text:  HereDragons
term type:  wordword
source start,end0,4 14,21
payload 


> the other thing to consider: you can increase the slop value on that
> phrase query (to allow looser matching) using the "qs" param (query slop) 
> ... that could help in this situation (stop words getting striped out of 
> hte query) as well as other situations (ie: what if the user just types 
> "here be dragons" -- with or without stop words)

After fiddling with the position incremements stuff I upped the query 
slop to 2 which seems to now provide better results but I'm worried 
about that effecting relevancy elsewhere (which I presume is the reason 
why it's not the default value).

If that's the case - is it worth writing something for my app so that if 
it detects a phrase query with lots of stop words it ups the phrase 
slop?

Either way it seems to be working now  - thanks for all the help,

Simon



Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-23 Thread Grant Ingersoll

On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:

> 
> Following code is from my test case where it tries to index a file (of type
> .txt)
> ContentStreamUpdateRequest up = new
> ContentStreamUpdateRequest("/update/extract");
> up.addFile(fileToIndex);
> up.setParam("literal.key", "8978"); //key is the uniqueId
> up.setParam("ext.literal.docName", "doc123.txt");
> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
> server.request(up);   
> 
> test case doesn't give me any error and "I think" its indexing the file? but
> when I search for a text (which was part of the .txt file) search doesn't
> return me anything.

What do your logs show?  Else, what does Luke show or doing a *:* query 
(assuming this is the only file you added)?

Also, I don't think you need ext.literal anymore, just literal.

> 
> Following is the config from solrconfig.xml where I have mapped content to
> "description" field(default search field) in the schema.
> 
>  class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>
>  description
>  description
>
>  
> 
> Clearly it seems I am missing something. Any idea?



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search



Re: Embedded solr with third party libraries

2009-11-23 Thread Lance Norskog
To deploy the Lucid KStem stemmer, copy these two files:
lucid-kstem.jar
lucid-solr-kstem.jar
to the lib/ directory in your running solr instance.

In the  declaration for a text field, you would change this line:

to this:


(Remember that you have to make this change in both the query and
analysis sections of the fieldType specification.)

Now, to verify the change, restart solr and go to the analysis.jsp test page:
http://localhost:8983/solr/admin/analysis.jsp

Let's say you changed the 'text' type and left 'textTight' using
PorterStemmer. Change the Field name/type drop-down to 'type' and type
'text' in the top box. Now type 'changing' in the "Field Value" box
and click 'Analyze'. The bottom of the page will now show that
'changeing' was stemmed to 'change'. If you change the field type from
'text' to 'textTight' and try again, 'changing' will be stemmed to
'chang' by the original PorterStemmer.

On Mon, Nov 23, 2009 at 12:23 PM, Chris Hostetter
 wrote:
>
> : distirbution. When we run test cases our schema.xml has defintion for lucid
> : kstem and it throws ClassNotFound Exception.
> : We declared the depency for the two jars lucid-kstem.jar and
> : lucid-solr-kstem.jar but still it throws an error.
>
> explain what you mean by "declared the depency" ?
>
> : 
> C:\DOCUME~1\username\LOCALS~1\Temp\solr-all\0.8194571792905493\solr\conf\schema.xml
> :
> : Now in order for the jar to be loaded should i copy the two jars to solr/lib
> : directory. is that the default location embedded solr looks into for some
> : default jars.
>
> assuming "C:\DOCUME~1\username\LOCALS~1\Temp\solr-all\0.8194571792905493\solr"
> is your sole home dir, then yes you can copy your jars into
> "C:\DOCUME~1\username\LOCALS~1\Temp\solr-all\0.8194571792905493\solr\lib"
> and that should work ... or starting in Solr 1.4 you can use the 
> directorives to specify a jar anywhere on disk.  see the example
> solrconfig.xml for the syntax.
>
>
>
> -Hoss
>
>



-- 
Lance Norskog
goks...@gmail.com


ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-23 Thread javaxmlsoapdev

Following code is from my test case where it tries to index a file (of type
.txt)
ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(fileToIndex);
up.setParam("literal.key", "8978"); //key is the uniqueId
up.setParam("ext.literal.docName", "doc123.txt");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);  
server.request(up); 

test case doesn't give me any error and "I think" its indexing the file? but
when I search for a text (which was part of the .txt file) search doesn't
return me anything.

Following is the config from solrconfig.xml where I have mapped content to
"description" field(default search field) in the schema.



  description
  description

  

Clearly it seems I am missing something. Any idea?

Thanks,
-- 
View this message in context: 
http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26486817.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Complex multi-value boosting

2009-11-23 Thread Stephen Duncan Jr
On Mon, Nov 23, 2009 at 3:39 PM, Michael Lugassy  wrote:

> Guys --
>
> What schema will you use for 500K docs with a variety of 0-30
> different category ids, each carrying its own weight and completely
> overriding the default scoring?
>
> For example, these documents:
> A: 1:0.21, 2:0.41, 3:0.15 ...
> B: 1:0.18, 2:0.65 4:0.98 ...
> C: 6:0.75 ...
> D: 2:0.14 ...
>
> When searching "1" I'd like document A to appear first (has 0.21) and
> when searching "1 || 2" i'd like document B to appear first (has an
> aggregate score of 0.83 vs. 0.62). Currently I run this with full-text
> after artificially repeating the number of each category's weight
> (i.e. "1" would appear 21 times on a text field) - is there a better
> way?
>
> Best,
>
> -- Michael
>

It sounds to me like you want to use payloads (the same issue I had
recently):
http://old.nabble.com/Customizing-Field-Score-%28Multivalued-Field%29-tp26182254p26182254.html

That thread has some details on the eventual implementation I chose.  Let me
know if you have any questions.  Note that I did use the scoring as a boost,
not "completely overriding the default scoring", but I think the impact is
basically the same, as was satisfied it was good enough.

-- 
Stephen Duncan Jr
www.stephenduncanjr.com


Re: access denied to solr home lib dir

2009-11-23 Thread Chris Hostetter

: Check.  I even verified that the tomcat user could create the
: directory (i.e. "sudo -u tomcat6 mkdir /opt/solr/steve/lib").  Still
: solr complains.

Note that you have an AccessControlException, not a simple 
FileNotFoundException ... the error here is coming from File.canRead (when 
Solr is asking if it has permision to read the file) but your 
ServletContainer evidently has a security policy in place that prevent's 
solr from even checking (if the security policy allowed it to check, then 
it would return true/false based on the actaul file permisions)...

http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html#canRead%28%29

Tests whether the application can read the file denoted by this 
abstract pathname.

Returns:
true if and only if the file specified by this abstract pathname 
exists and can be read by the application; false otherwise 
Throws:
SecurityException - If a security manager exists and its
SecurityManager.checkRead(java.lang.String) method denies read
access to the file

...note that Tomcat doesn't have any special SecurityManager settings that 
prevent this by default.  something about your tomcat deployment must be 
specifying specific Security Permision rules.

: >> Caused by: java.security.AccessControlException: access denied
: >> (java.io.FilePermission /opt/solr/steve/./lib read)
: >>       at 
java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
: >>       at 
java.security.AccessController.checkPermission(AccessController.java:546)
: >>       at 
java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
: >>       at java.lang.SecurityManager.checkRead(SecurityManager.java:871)
: >>       at java.io.File.canRead(File.java:689)
: >>       at 
org.apache.solr.core.SolrResourceLoader.replaceClassLoader(SolrResourceLoader.java:157)
: >>       at 
org.apache.solr.core.SolrResourceLoader.addToClassLoader(SolrResourceLoader.java:128)
: >>       at 
org.apache.solr.core.SolrResourceLoader.(SolrResourceLoader.java:97)
: >>       at 
org.apache.solr.core.SolrResourceLoader.(SolrResourceLoader.java:195)
: >>       at org.apache.solr.core.Config.(Config.java:93)
: >>       at 
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:65)
: >>       ... 40 more


-Hoss


Re: Announcing the Apache Solr extension in PHP - 0.9.0

2009-11-23 Thread Thanh Doan

Thanks Israel

I plan to try it and compare with rsolr


On Nov 23, 2009, at 2:28 PM, Michael Lugassy  wrote:


Thanks Israel, exactly what I was looking for, but how would one get a
pre-compiled dll for windows? using PHP 5.3 VS9 TS.

On Mon, Oct 5, 2009 at 7:03 AM, Israel Ekpo   
wrote:

Fellow Apache Solr users,

I have been working on a PHP extension for Apache Solr in C for quite
sometime now.

I just finished testing it and I have completed the initial user  
level

documentation of the API

Version 0.9.0-beta has just been released.

It already has built-in readiness for Solr 1.4

If you are using Solr 1.3 or later in PHP, I would appreciate if  
you could

check it out and give me some feedback.

It is very easy to install on UNIX systems. I am still working on  
the build

for windows. It should be available for Windows soon.

http://solr.israelekpo.com/manual/en/solr.installation.php

A quick list of some of the features of the API include :
- Built in serialization of Solr Parameter objects.
- Reuse of HTTP connections across repeated requests.
- Ability to obtain input documents for possible resubmission from  
query

responses.
- Simplified interface to access server response data (SolrObject)
- Ability to connect to Solr server instances secured behind HTTP
Authentication and proxy servers

The following components are also supported
- Facets
- MoreLikeThis
- TermsComponent
- Stats
- Highlighting

Solr PECL Extension Homepage
http://pecl.php.net/package/solr

Some examples are available here
http://solr.israelekpo.com/manual/en/solr.examples.php

Interim Documentation Page until refresh of official PHP  
documentation

http://solr.israelekpo.com/manual/en/book.solr.php

The C source is available here
http://svn.php.net/viewvc/pecl/solr/

--
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.



Re: [N to M] range search out of sum of field. howto search this?

2009-11-23 Thread Chris Hostetter

: fq={!frange l=5 u=10}sum(user,num)

H, One of us massivly missunderstood the original question - and i'm 
pretty sure it's Yonik.

i don't think he wants results where the user field plus the num field are 
in the range of 5-10 ... i think he wants the list of user Ids (which are 
numbers in his examples, but could just as easily be strings) where the 
sum of the "num" fields in all documents that have the same value in the 
"user" field are the same.

I can't think of any easy way to do that ... it isn't the kind of thing an 
Inverted Index is particuaraly good at.  but maybe there's soemthing in 
the Field Collapsing patch (searching the archives/wiki will bring up 
pointers) that can filter on stats like this?

: On Mon, Nov 23, 2009 at 8:49 AM, Julian Davchev  wrote:
: > Hi folks,
: > I got documents like
: > user:1   num:5
: > user:1   num: 8
: > user:5   num:7
: > user:5   num:1
: > 
: >
: >
: > I'd like to get per user that maches sum of num range 5 to 10
: > In this case it should return user 5  as 7+1=8 and is within range.
: > User 1 will be false cause sum of num is 5+8=13 hence outside range 5 to 10

-Hoss


Re: Boost document base on field length

2009-11-23 Thread Chris Hostetter

: > I would like to boost documents with longer descriptions to move down 
documents with 0 length description,
: > I'm wondering if there is possibility to boost document basing on the field 
length while searching or the only way is to store field length as an int in a 
separate field while indexing?
: 
: Override the default Similarity (see the end of the schema.xml file) 
: with your own Similarity implementation and then in that class override 
: the lengthNorm() method.


I think i'm reading he question differently then Grant -- his suggestion 
applies when you are searching in the description field, and don't want 
documents with shorter descriptions to score higher when the same terms 
match the same number of times (the default behavior of lengthNorm)

my udnerstanding is that you want documents that don't have a description 
to score lower then documents that do -- and you might be querying against 
completely differnet fields (description might not even be indexed)

in that case there is no easy way to to achieve this with just the 
description field ... the easy thing to do is to index a boolean 
"has_description" field and then incorporate that into your query (or as 
the input to a function query)


-Hoss



Re: Output all, from one field

2009-11-23 Thread Chris Hostetter

: Do you want to return just one field from all documents? If yes, you can:
: 
:1. Query with q=*:*&fl=name
:2. Use TermsComponent - http://wiki.apache.org/solr/TermsComponent

note that those are very differnet creatures ... #1 gives you all of the 
stored values for every document.  #2 gives you all of the indexed terms 
(some of which may have all come from a single indexed value)


-Hoss



Re: Question about the message "Indexing failed. Rolled back all changes."

2009-11-23 Thread Lance Norskog
This is definitely a bug. Please open a JIRA issue for this.

On Sat, Nov 21, 2009 at 10:53 AM, Bertie Shen  wrote:
> Hey,
>
>  I figured out why we always we have see Indexing failed.
> Rolled back all changes..  It is because we need a
> dataimport.properties file at conf/, into which indexing will write a last
> indexing time. Without that file, SolrWriter.java will put throw an
> exception and Solr will have this misleading  Indexing failed.
> Rolled back all changes.. output, although indexing is actually
> successfully completed.
>
>  I think we need to improve this functionality, at least documentation.
>
>  There are one more thing that we need to pay attention to, i.e. we need to
> make dataimport.properties writable by other users, otherwise,
> last_index_time will not be written and the error message may still be
> there.
>
> On Fri, Nov 13, 2009 at 9:35 AM, yountod  wrote:
>
>>
>> The process initially completes with:
>>
>>  2009-11-13 09:40:46
>>  Indexing completed. Added/Updated: 20 documents. Deleted
>> 0 documents.
>>
>>
>> ...but then it fails with:
>>
>>  2009-11-13 09:40:46
>>   Indexing failed. Rolled back all changes.
>>   2009-11-13 09:41:10
>>  2009-11-13 09:41:10
>>  2009-11-13 09:41:10
>>
>>
>> 
>> I think it may have something to do with this, which I found by using the
>> DataImport.jsp:
>> 
>> (Thread.java:636) Caused by: java.sql.SQLException: Illegal value for
>> setFetchSize(). at
>> com.mysql.jdbc.Statement.setFetchSize(Statement.java:1864) at
>>
>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:242)
>> ... 28 more
>>
>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Question-about-the-message-%22Indexing-failed.-Rolled-back-all--changes.%22-tp26242714p26340360.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Announcing the Apache Solr extension in PHP - 0.9.0

2009-11-23 Thread Israel Ekpo
Hi Mike,

Thanks to Pierre, the Windows version of the extension are available here
compiled from trunk  r 291135

http://downloads.php.net/pierre/

I am planning to have 0.9.8 compiled for windows as soon as it is out
sometime later this week.

The 1.0 release should be out sometime before mid December after the API is
finalized and tested.

You can always check the project home page for news about upcoming releases

http://pecl.php.net/package/solr

The documentation is available here
http://www.php.net/manual/en/book.solr.php

Cheers


On Mon, Nov 23, 2009 at 3:28 PM, Michael Lugassy  wrote:

> Thanks Israel, exactly what I was looking for, but how would one get a
> pre-compiled dll for windows? using PHP 5.3 VS9 TS.
>
> On Mon, Oct 5, 2009 at 7:03 AM, Israel Ekpo  wrote:
> > Fellow Apache Solr users,
> >
> > I have been working on a PHP extension for Apache Solr in C for quite
> > sometime now.
> >
> > I just finished testing it and I have completed the initial user level
> > documentation of the API
> >
> > Version 0.9.0-beta has just been released.
> >
> > It already has built-in readiness for Solr 1.4
> >
> > If you are using Solr 1.3 or later in PHP, I would appreciate if you
> could
> > check it out and give me some feedback.
> >
> > It is very easy to install on UNIX systems. I am still working on the
> build
> > for windows. It should be available for Windows soon.
> >
> > http://solr.israelekpo.com/manual/en/solr.installation.php
> >
> > A quick list of some of the features of the API include :
> > - Built in serialization of Solr Parameter objects.
> > - Reuse of HTTP connections across repeated requests.
> > - Ability to obtain input documents for possible resubmission from query
> > responses.
> > - Simplified interface to access server response data (SolrObject)
> > - Ability to connect to Solr server instances secured behind HTTP
> > Authentication and proxy servers
> >
> > The following components are also supported
> > - Facets
> > - MoreLikeThis
> > - TermsComponent
> > - Stats
> > - Highlighting
> >
> > Solr PECL Extension Homepage
> > http://pecl.php.net/package/solr
> >
> > Some examples are available here
> > http://solr.israelekpo.com/manual/en/solr.examples.php
> >
> > Interim Documentation Page until refresh of official PHP documentation
> > http://solr.israelekpo.com/manual/en/book.solr.php
> >
> > The C source is available here
> > http://svn.php.net/viewvc/pecl/solr/
> >
> > --
> > "Good Enough" is not good enough.
> > To give anything less than your best is to sacrifice the gift.
> > Quality First. Measure Twice. Cut Once.
> >
>



-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


RE: Multi word synonym problem

2009-11-23 Thread Chris Hostetter

: The response is not searching for Michael Jackson. Instead it is 
: searching for (text:Micheal and text: Jackson).To monitor the parsed 
: query, i turned on debugQuery, but in the present case, the parsed query 
: string was searching Micheal and Jackson separately.

using index time synonyms isn't ggoing to have any effect on how your 
query is parsed.  the Lucene/Solr query parsers uses whitespace as 
"markup" and will still analyze each of the "words" in your input 
seperately and build up a boolean query containing each of your words 
individually (the only way to change that is to use quotes to force 
"phrase query" behavior where everything in quotes is analyzed as one 
chunk, or pick a different queyr parse like the "field" parser)

...but none of that changes the point of *why* you can/should use index 
time synonyms for situations like this.  the point of doing that is that 
at index time the alternate versions of the multi-word sequences can all 
be expanded and all varients are put in the index ... so it doesn't matter 
if you use a phrase query, or term queries, all of the synonyms are in the 
index document.



-Hoss



Complex multi-value boosting

2009-11-23 Thread Michael Lugassy
Guys --

What schema will you use for 500K docs with a variety of 0-30
different category ids, each carrying its own weight and completely
overriding the default scoring?

For example, these documents:
A: 1:0.21, 2:0.41, 3:0.15 ...
B: 1:0.18, 2:0.65 4:0.98 ...
C: 6:0.75 ...
D: 2:0.14 ...

When searching "1" I'd like document A to appear first (has 0.21) and
when searching "1 || 2" i'd like document B to appear first (has an
aggregate score of 0.83 vs. 0.62). Currently I run this with full-text
after artificially repeating the number of each category's weight
(i.e. "1" would appear 21 times on a text field) - is there a better
way?

Best,

-- Michael


Re: creating Lucene document from an external XML file.

2009-11-23 Thread Chris Hostetter

: If I understand you correctly, you really want to be constructing 
: SolrInputDocuments (not Lucene's Documents) and indexing those with 
: SolrJ.  I don't think there is anything in the API that can read in an 

I read your question differently then Otis did.  My understanding is that 
you already have code that builds up files in the "..." update 
message syntax solr expects, but you want to modify those documents (wi/o 
changing your existing code)

one possibility to think about is that instead of modifying the documents 
before sending them to Solr, you could write an UpdateProcessor tha runs 
direclty in Solr and gets access to those Documents after Solr has already 
parsed that XML (or even if the documents come from someplace else, like 
DIH, or a CSV file) and then make your changes.


If Otis and i have *both* missunderstood your question, please clarify.



-Hoss



RE: UTF-8 Character Set not specifed on OutputStreamWriter in StreamingUpdateSolrServer

2009-11-23 Thread Chris Hostetter

: Specifying the file.encoding did work, although I don't think it is a 
: suitable workaround for my use case.  Any idea what my next step is to 
: having a bug opened.

no, you shouldn't *have* to specifying -Dfile.encoding=UTF8, Shalin was 
just asking to try that to verify that really was the extent of the 
problem

I created a bug to track this...
https://issues.apache.org/jira/browse/SOLR-1595


-Hoss



Re: Announcing the Apache Solr extension in PHP - 0.9.0

2009-11-23 Thread Michael Lugassy
Thanks Israel, exactly what I was looking for, but how would one get a
pre-compiled dll for windows? using PHP 5.3 VS9 TS.

On Mon, Oct 5, 2009 at 7:03 AM, Israel Ekpo  wrote:
> Fellow Apache Solr users,
>
> I have been working on a PHP extension for Apache Solr in C for quite
> sometime now.
>
> I just finished testing it and I have completed the initial user level
> documentation of the API
>
> Version 0.9.0-beta has just been released.
>
> It already has built-in readiness for Solr 1.4
>
> If you are using Solr 1.3 or later in PHP, I would appreciate if you could
> check it out and give me some feedback.
>
> It is very easy to install on UNIX systems. I am still working on the build
> for windows. It should be available for Windows soon.
>
> http://solr.israelekpo.com/manual/en/solr.installation.php
>
> A quick list of some of the features of the API include :
> - Built in serialization of Solr Parameter objects.
> - Reuse of HTTP connections across repeated requests.
> - Ability to obtain input documents for possible resubmission from query
> responses.
> - Simplified interface to access server response data (SolrObject)
> - Ability to connect to Solr server instances secured behind HTTP
> Authentication and proxy servers
>
> The following components are also supported
> - Facets
> - MoreLikeThis
> - TermsComponent
> - Stats
> - Highlighting
>
> Solr PECL Extension Homepage
> http://pecl.php.net/package/solr
>
> Some examples are available here
> http://solr.israelekpo.com/manual/en/solr.examples.php
>
> Interim Documentation Page until refresh of official PHP documentation
> http://solr.israelekpo.com/manual/en/book.solr.php
>
> The C source is available here
> http://svn.php.net/viewvc/pecl/solr/
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
>


Re: How to use DataImportHandler with ExtractingRequestHandler?

2009-11-23 Thread javaxmlsoapdev

Anyone any idea?

javaxmlsoapdev wrote:
> 
> did you extend DIH to do this work? can you share code samples. I have
> similar requirement where I need tp index database records and each record
> has a column with document path so need to create another index for
> documents (we allow users to search both index separately) in parallel
> with reading some meta data of documents from database as well. I have all
> sorts of different document formats to index. I am on solr 1.4.0. Any
> pointers would be appreciated.
> 
> Thanks,
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/How-to-use-DataImportHandler-with-ExtractingRequestHandler--tp25267745p26485245.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Embedded solr with third party libraries

2009-11-23 Thread Chris Hostetter

: distirbution. When we run test cases our schema.xml has defintion for lucid
: kstem and it throws ClassNotFound Exception.
: We declared the depency for the two jars lucid-kstem.jar and
: lucid-solr-kstem.jar but still it throws an error.

explain what you mean by "declared the depency" ?

: 
C:\DOCUME~1\username\LOCALS~1\Temp\solr-all\0.8194571792905493\solr\conf\schema.xml
: 
: Now in order for the jar to be loaded should i copy the two jars to solr/lib
: directory. is that the default location embedded solr looks into for some
: default jars.

assuming "C:\DOCUME~1\username\LOCALS~1\Temp\solr-all\0.8194571792905493\solr" 
is your sole home dir, then yes you can copy your jars into 
"C:\DOCUME~1\username\LOCALS~1\Temp\solr-all\0.8194571792905493\solr\lib" 
and that should work ... or starting in Solr 1.4 you can use the  
directorives to specify a jar anywhere on disk.  see the example 
solrconfig.xml for the syntax.



-Hoss



Re: how to get the autocomplete feature in solr 1.4?

2009-11-23 Thread Andrzej Bialecki

Chris Hostetter wrote:

: how to get the autocomplete/autosuggest feature in the solr1.4.plz give me
: the code also...

there is no magical "one size fits all" solution for autocomplete in solr.  
if you look at the archives there have been lots of discussions about 
differnet ways ot get auto complete functionality, using things like the 
TermsComponent, or the LukeRequest handler, and there are lots of examples 
of using the SolrJS javascript functionality to populate an autocomplete 
box -- but you'll have to figure out what solution works best for your 
goals.


Also, take a look at SOLR-1316, there are patches there that implement 
such component using prefix trees.



--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Huge load and long response times during search

2009-11-23 Thread Chris Hostetter

In addition to some of hte other coments mentioned about IO, this caught 
my eye...

: I'm using SOLR(1.4) to search among about 3,500,000 documents. After the
: server kernel was updated to 64bit system has started to suffer.

...if the *only* thing that was upgraded was switching the kernel from 
32bit to 64bit, then perhaps you are getting bit by java now using 64 bit 
pointers instead of 32 bit pointers, causing a lot more ram to be eaten up 
by the pointers?

it's not soemthing i've done a lot of testing on, but i've heared other 
people claim that it can cause some serious problems if you don't actaully 
need 64bit pointers for accessing huge heaps.

...that said, you should really double check what exactly what changed 
when your server was upgraded ... perhaps the upgrad inlcuded a new 
filesystem type, or changes to RAID settings, or even hardware changes ... 
if your problems started when an upgrade took place, then looking into 
what exactly changed during hte upgrade should be your furst step.



-Hoss



Re: Oddness with Phrase Query

2009-11-23 Thread Chris Hostetter

: ?q="Here there be dragons"
: &qt=dismax
: &qf=title
...
: +DisjunctionMaxQuery((title:"here dragon")~0.01) ()

...the quotes cause the entire string to be passed to the analyzer for 
the title field and the resulting Tokens are used to construct a phrase 
query.

: ?q=Here there be dragons
: &qt=dismax
: &qf=title
...
: +((DisjunctionMaxQuery((title:here)~0.01) 
: DisjunctionMaxQuery((title:dragon)~0.01))~2) ()

...the lack of quotes just results in two term queries, that must be 
anywhere in the string.

: It looks like it might be related to 
...
: http://issues.apache.org/jira/browse/SOLR-879
: 
: Although I added enablePositionIncrements="true" to
: 
: 
: 
: in to the  for  in the 
: schema which didn't fix it - I presume this means that I have to reindex 
: everything (although the StopFilterFactory in  
: already had it).

...hmm, you shouldn't have to reindex everything.  arey ou sure you 
restarted solr after making the enablePositionIncrements="true" change to 
the query analyzer?

what do the offsets look like when you go to analysis.jsp and past in that 
sentence?

the other thing to consider: you can increase the slop value on that
phrase query (to allow looser matching) using the "qs" param (query slop) 
... that could help in this situation (stop words getting striped out of 
hte query) as well as other situations (ie: what if the user just types 
"here be dragons" -- with or without stop words)



-Hoss



help with dataimport delta query

2009-11-23 Thread Joel Nylund
Hi, I have solr all working nicely, except im trying to get deltas to  
work on my data import handler


Here is a simplification of my data import config, I have a table  
called "Book" which has categories, im doing subquries for the  
category info and calling a javascript helper. This all works  
perfectly for the regular query.


I added these lines for the delta stuff:

deltaImportQuery="SELECT f.id,f.title
FROM Book f
f.id='${dataimporter.delta.job_jobs_id}'"
		deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
lastModifiedDate > '${dataimporter.last_index_time}'"  >


basically im trying to rows that lastModifiedDate is newer than the  
last index (or deltaindex).


I run:
http://localhost:8983/solr/dataimport?command=delta-import

And it says in logs:

Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DataImporter doDeltaImport

INFO: Starting Delta Import
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.SolrWriter  
readIndexerProperties

INFO: Read dataimport.properties
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
doDelta

INFO: Starting delta collection.
Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=delta-import}  
status=0 QTime=0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta

INFO: Running ModifiedRowKey() for Entity: category
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta

INFO: Completed ModifiedRowKey for Entity: category rows obtained : 0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta

INFO: Completed DeletedRowKey for Entity: category rows obtained : 0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta

INFO: Completed parentDeltaQuery for Entity: category
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta

INFO: Running ModifiedRowKey() for Entity: item
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta

INFO: Completed ModifiedRowKey for Entity: item rows obtained : 0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta

INFO: Completed DeletedRowKey for Entity: item rows obtained : 0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta

INFO: Completed parentDeltaQuery for Entity: item
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
doDelta

INFO: Delta Import completed successfully
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
execute

INFO: Time taken = 0:0:0.21

But the browser says no documents added/modified (even though one  
record in db is a match)


Is there a way to turn debugging so I can see the queries the DIH is  
sending to the db?


Any other ideas of what I could be doing wrong?

thanks
Joel



		deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
lastModifiedDate > '${dataimporter.last_index_time}'"  >


   
   
 		transformer="script:SplitAndPrettyCategory" query="select fc.bookId,  
group_concat(cr.name) as categoryName,

 from BookCat fc
 where fc.bookId = '${item.id}' AND
 group by fc.bookId">
 
 

   




Spellcheck: java.lang.RuntimeException: java.io.IOException: read past EOF

2009-11-23 Thread ranjitr

Hello,

Solr 1.3 reported the following error when our app tried to query it:

java.lang.RuntimeException: java.io.IOException: read past EOF
at
org.apache.solr.spelling.IndexBasedSpellChecker.build(IndexBasedSpellChecker.java:91)
at
org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:108)
.

I noticed that there were about 54 segments.* files under the spellcheck
directory. The way I resolved the problem was by going into the spellcheck
directory & deleting off all the files in it. I then issued a curl command
to rebuild the spellcheck index (I also did a full-import & reload of the
main index, to be safe.)

When this error occured, our solrconfig.xml had spellcheck.build set to
true. This was a configuration eror on our part. I was wondering if the
spellcheck index being re-built for each query could have caused the above
exception to occur.

Kind clarify.

Thanks,
Ranjit.
-- 
View this message in context: 
http://old.nabble.com/Spellcheck%3A-java.lang.RuntimeException%3A-java.io.IOException%3A-read-past-EOF-tp26484580p26484580.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Disable coord

2009-11-23 Thread Chris Hostetter

: Thanks for your reply.  Nested boolean queries is a valid concern.  I also
: realized that isCoordDisabled needs to be considered in
: BooleanQuery.hashCode so that a query with coord=false will have a different
: cache key in Solr.

Hmmm... you're right, BooleanQuery.hashCode doesn't consider disableCoord.  
that's a nasty bug...

https://issues.apache.org/jira/browse/LUCENE-2092


-Hoss



Re: how to get the autocomplete feature in solr 1.4?

2009-11-23 Thread Chris Hostetter

: how to get the autocomplete/autosuggest feature in the solr1.4.plz give me
: the code also...

there is no magical "one size fits all" solution for autocomplete in solr.  
if you look at the archives there have been lots of discussions about 
differnet ways ot get auto complete functionality, using things like the 
TermsComponent, or the LukeRequest handler, and there are lots of examples 
of using the SolrJS javascript functionality to populate an autocomplete 
box -- but you'll have to figure out what solution works best for your 
goals.



: -- 
: View this message in context: 
http://old.nabble.com/how-to-get-the-autocomplete-feature-in-solr-1.4--tp26402992p26402992.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 



-Hoss



Re: Where is upgrading documentation?

2009-11-23 Thread Chris Hostetter
: Subject: Re: Where is upgrading documentation?
: 
: CHANGES.txt contains information, but no instructions.

Hmmm i see what you mean. we typically have a nice boiler plate set of 
instructions, and somehow those got removed from the 1.4 changes.

in a nutshell, the instructions are the same as 1.3...


IMPORTANT UPGRADE NOTE: In a master/slave configuration, all searchers/slaves
should be upgraded before the master!  If the master were to be updated
first, the older searchers would not be able to read the new index format.

...

Older Apache Solr installations can be upgraded by replacing
the relevant war file with the new version.  No changes to configuration
files should be needed.  

This version of Solr contains a new version of Lucene implementing
an updated index format.  This version of Solr/Lucene can still read
and update indexes in the older formats, and will convert them to the new
format on the first index change.  Be sure to backup your index before 
upgrading in case you need to downgrade.


: http://wiki.apache.org/solr/Solr1.4


-Hoss



Re: Index time boosting troubles

2009-11-23 Thread Chris Hostetter

: I had working index time boosting on documents like so: 
: 
: Everything was great until I made some changes that I thought where no
: related to the doc boost but after that my doc boosting appears to be
: missing.
: 
: I'm having a tough time debugging this and didn't have the sense to version
: control this so I would have something to revert to (lesson learned).
: 
: In schema.xml I have 

...i don't relaly udnerstand your question.  what does that one fieldtype 
have to do with your specific issue?  if you post your whole schema, and 
some examples of hte types of docs you are indexing and the queries you 
are trying then people can probably help you see how/when/why your index 
time boosts come into play, but a single fieldtype from your schema 
without any context doesn't give us much to go on.


-Hoss



Re: Fwd: solr index-time boost... help required please

2009-11-23 Thread Chris Hostetter

: Now I am trying *index-time *boosting to improve response time. So i created
: an algorithm where I do the following:-
: 1. sort the records i get from database on approval_dt asc and increase the
: boost value of the  element for approval_dt by 0.1 as i encounter
: higer approval_dt records. If there is no approval_dt for a record, not
: boost value for it. I made omitnorms=false in schema.xml for approval_dt
: field. Now when I apply the same query nothing special happens ie I dont
: even see the latest dates first.

index time boosting of a field just affects tehfieldNorms for the specific 
field you apply the boost too -- if you don't search on that field (with a 
score based query type), the boost doesn't affect things.  so if you 
applied an index time boost to some field named "approval_dt" then that 
boost isn't going to matter unless you query against the approval_dt field 
-- but if you use something like a range query, the boost still won't 
matter because range queries don't affect the score.

more then likely what you want to do is use a *document* boost instead of 
a field boost .. that way the boost factor gets applied to any field you 
have that includes the norms, so no matter what field you query on the 
boost will get applied.

: 2. If we boost a doc or field in the xml should we again use the bf
: parameter with a function to put the boost into effect while querying when
: trying index-time boost also?

index time boosts and query boosts are completley orthoginal, you can use 
both together, but they don't require (or know) about eachother at all

: 3. Also can you frame a query for me to see the latest approval_dt coming
: first using the index-time boost approach.

not with the setup you've described ... date based queries really won't 
ever look at the norms for the data field (unlessy ou did a term query for 
a very specified date value)

: 4. Does bf function play any role in solrconfig.xml when we plan to use
: index-time boost. My understanding is bf is used only for query-time boost.

you are correct.

: 5. Is it necessary to use bq in case of index time boost.

same answer as #2.


-Hoss



Re: Factory cannot be cast

2009-11-23 Thread Chris Hostetter

: previously I was using a NGramFilterFactory for the completion on my website
: but the EdgeNGramTokenizerFactory seems to be more pertinent.
: 
: I defined my own field type  but when I start solr I got the error log :
: 
: GRAVE: java.lang.ClassCastException:
: org.apache.solr.analysis.EdgeNGramTokenizerFactory cannot be cast to
: org.apache.solr.analysis.Toke
: nFilterFactory

You can't use a TokenizerFactory as a TokenFilterFacotry --- they do very 
different things.  A Tokenizer is responsible for comverting a stream of 
characters into a stream of Tokens, while a TokenFilter is responsile for 
processing an existing stream of Tokens and producing a (odified) stream 
of Tokens.



-Hoss



RE: schema-based Index-time field boosting

2009-11-23 Thread Chris Hostetter

: Yeah, like I said, I was mistaken about setting field boost in
: schema.xml - doesn't mean it's a bad idea though.  At any rate, from
: your penultimate sentence I reckon at least one of us is still confused
: about field boosting, feel free to reply if you think it's me ;)

Yeah ... i think it's you.  like i said...

: field boosting only makes sense if it's only applied to some of the
: documents in the index, if every document has an index time boost on
: fieldX, then that boost is meaningless.

...if there was a way to oost fields at index time that was configured in 
the schema.xml, then every doc would get that boost on it's instances of 
those fields but the only purpose of index time boosting is to indicate 
that one document is more significant then another doc -- if every doc 
gets the same boost, it becomes a No-OP.

(think about the math -- field boosts become multipliers in the fieldNorm 
-- if every doc gets the same multiplier, then there is no net effect)



-Hoss



Re: NPE when trying to view a specific document via Luke

2009-11-23 Thread Chris Hostetter

: I think thats the case - I'm not seeing the problem - though I didn't
: follow your steps exactly, because I also set the data dir.

yeah ... i went back and tested again and verified that's what was 
happening.

There is a bug with Luke when viewing "binary" based fields 
introduced in Solr 1.4 (like the new Trie fields) which yonik has fixed in 
SOLR-1563 but i can't trigger any similar problems when using an existing 
1.3 schema and/or index.


-Hoss



Re: error with multicore CREATE action

2009-11-23 Thread Chris Harris
Are there any use cases for CREATE where the instance directory
*doesn't* yet exist? I ask because I've noticed that Solr will create
an instance directory for me sometimes with the CREATE command. In
particular, if I run something like

http://solrhost/solr/admin/cores?action=CREATE&name=newcore&instanceDir=d:\dir_that_does_not_exist\&config=C:\dir_that_does_exist\solrconfig.xml&schema=C:\dir_that_does_exist\schema.xml

then Solr will create

d:\dir_that_does_not_exist

and

d:\dir_that_does_not_exist\data

for me (but not d:\dir_that_does_not_exist\conf).

Maybe this has to do with some particularly in my solrconfig.xml?
(There I've commented out the dataDir element because I prefer the
default behavior to what you get with "${solr.data.dir:./solr/data}".)

2009/11/23 Shalin Shekhar Mangar :

> The instance directory and the configuration files should exist before you
> can create a core. The core CREATE command just creates a Solr core instance
> in memory after reading the configuration from disk.


Re: Output all, from one field

2009-11-23 Thread Shalin Shekhar Mangar
On Mon, Nov 23, 2009 at 4:29 PM, Jörg Agatz wrote:

> Hallo,
>
> I search for a way, to output all content from one field..
>
> Like name:
>
> "NAME:*"
>
> And Solr gifs me all Names
>
> or "color:*"
>
> and i become all colors
>
> can io do this? or is this Impossible?
>
>
Do you want to return just one field from all documents? If yes, you can:

   1. Query with q=*:*&fl=name
   2. Use TermsComponent - http://wiki.apache.org/solr/TermsComponent


-- 
Regards,
Shalin Shekhar Mangar.


Re: solr artifacts / apache maven repository

2009-11-23 Thread Shalin Shekhar Mangar
On Mon, Nov 23, 2009 at 11:04 PM, TCK  wrote:

> Thanks, yes that's what I do as well. I'd like to pull in the standard solr
> war distribution from a public repository and then explode it and put my
> own
> overlays.
>
> Shalin, any pointers to where I would look to go about making a patch to
> the
> artifact publishing process?
>
>
That'd be great. See http://wiki.apache.org/solr/HowToContribute

-- 
Regards,
Shalin Shekhar Mangar.


Re: solr artifacts / apache maven repository

2009-11-23 Thread TCK
Thanks, yes that's what I do as well. I'd like to pull in the standard solr
war distribution from a public repository and then explode it and put my own
overlays.

Shalin, any pointers to where I would look to go about making a patch to the
artifact publishing process?

Thanks,
TCK




On Mon, Nov 23, 2009 at 12:23 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Nov 23, 2009 at 10:41 PM, Stephen Duncan Jr <
> stephen.dun...@gmail.com> wrote:
>
> > I currently put the war into my own Nexus repository.  I use it to build
> a
> > war-overlay with the solr war to include my plugins & customizations (due
> > to
> > classloading issues with Spring and the external plugin solution).
> >
> >
> I see. If people find it generally useful, we could publish the war too.
> Patches welcome :)
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: error with multicore CREATE action

2009-11-23 Thread Shalin Shekhar Mangar
On Mon, Nov 23, 2009 at 10:47 PM, Marc Sturlese wrote:

>
> Hey there,
> I am using Solr 1.4 out of the box and am trying to create a core at
> runtime
> using the CREATE action.
> I am getting this error when executing:
>
> http://localhost:8983/solr/admin/cores?action=CREATE&name=x&instanceDir=x&persist=true&config=solrconfig.xml&schema=schema.xml&dataDir=data
>
> Nov 23, 2009 6:18:44 PM org.apache.solr.core.SolrResourceLoader 
> INFO: Solr home set to 'solr/x/'
> Nov 23, 2009 6:18:44 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error executing default
> implementation of CREATE
>at
>
> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:250)
>at
> 



>
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml'
> in classpath or 'solr/x/conf/',
> cwd=/home/smack/Desktop/apache-solr-1.4.0/example
>at
>
> org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:260)
>at
>
> org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:228)
>at org.apache.solr.core.Config.(Config.java:101)
>at org.apache.solr.core.SolrConfig.(SolrConfig.java:130)
>at org.apache.solr.core.CoreContainer.create(CoreContainer.java:405)
>at
>
> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:245)
>... 21 more
>
> I don't know if I am missing something. Should I create manually de folders
> and schema and solconfig files?
>
>
The instance directory and the configuration files should exist before you
can create a core. The core CREATE command just creates a Solr core instance
in memory after reading the configuration from disk.

-- 
Regards,
Shalin Shekhar Mangar.


Re: solr artifacts / apache maven repository

2009-11-23 Thread Shalin Shekhar Mangar
On Mon, Nov 23, 2009 at 10:41 PM, Stephen Duncan Jr <
stephen.dun...@gmail.com> wrote:

> I currently put the war into my own Nexus repository.  I use it to build a
> war-overlay with the solr war to include my plugins & customizations (due
> to
> classloading issues with Spring and the external plugin solution).
>
>
I see. If people find it generally useful, we could publish the war too.
Patches welcome :)

-- 
Regards,
Shalin Shekhar Mangar.


error with multicore CREATE action

2009-11-23 Thread Marc Sturlese

Hey there,
I am using Solr 1.4 out of the box and am trying to create a core at runtime
using the CREATE action.
I am getting this error when executing:
http://localhost:8983/solr/admin/cores?action=CREATE&name=x&instanceDir=x&persist=true&config=solrconfig.xml&schema=schema.xml&dataDir=data

Nov 23, 2009 6:18:44 PM org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to 'solr/x/'
Nov 23, 2009 6:18:44 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error executing default
implementation of CREATE
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:250)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:111)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml'
in classpath or 'solr/x/conf/',
cwd=/home/smack/Desktop/apache-solr-1.4.0/example
at
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:260)
at
org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:228)
at org.apache.solr.core.Config.(Config.java:101)
at org.apache.solr.core.SolrConfig.(SolrConfig.java:130)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:405)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:245)
... 21 more

I don't know if I am missing something. Should I create manually de folders
and schema and solconfig files?

-- 
View this message in context: 
http://old.nabble.com/error-with-multicore-CREATE-action-tp26482255p26482255.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Very busy search screen

2009-11-23 Thread Shalin Shekhar Mangar
On Mon, Nov 23, 2009 at 10:36 PM, javaxmlsoapdev  wrote:

>
> I have a client who wants to search on almost every attribute of an object
> (nearly 15 attributes) on the search screen. Search sreen looks very
> crazy/busy. I was wondering if there are better ways to address these
> requirements and build intelligent categorized/configurable searchs?
> including allowing user to choose if they want to AND or OR attributes etc?
> Any pointers would be appreciated.
>
>
You can go with simple text box search on a catch-all field with facets for
drilling down. That's how most of us do it. If your client really want
complete control you'd have to educate them on solr's query syntax (or
perhaps create a simpler query syntax) but I wouldn't suggest going that
way.

-- 
Regards,
Shalin Shekhar Mangar.


Re: solr artifacts / apache maven repository

2009-11-23 Thread Stephen Duncan Jr
I currently put the war into my own Nexus repository.  I use it to build a
war-overlay with the solr war to include my plugins & customizations (due to
classloading issues with Spring and the external plugin solution).

On Mon, Nov 23, 2009 at 12:08 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Nov 23, 2009 at 10:31 PM, TCK  wrote:
>
> > Hi,
> >
> > I'd like to pull in the solr war from a public repository. I'm able to
> find
> > the individual jars at http://repo2.maven.org/maven2/org/apache/solr/but
> > it
> > seems like the war artifact isn't published. Is there a reason for this
> or
> > is it published elsewhere ?
> >
> >
> The war is not published as a maven artifact. Why would you need the war in
> maven?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Stephen Duncan Jr
www.stephenduncanjr.com


Re: solr artifacts / apache maven repository

2009-11-23 Thread Shalin Shekhar Mangar
On Mon, Nov 23, 2009 at 10:31 PM, TCK  wrote:

> Hi,
>
> I'd like to pull in the solr war from a public repository. I'm able to find
> the individual jars at http://repo2.maven.org/maven2/org/apache/solr/ but
> it
> seems like the war artifact isn't published. Is there a reason for this or
> is it published elsewhere ?
>
>
The war is not published as a maven artifact. Why would you need the war in
maven?

-- 
Regards,
Shalin Shekhar Mangar.


Very busy search screen

2009-11-23 Thread javaxmlsoapdev

I have a client who wants to search on almost every attribute of an object
(nearly 15 attributes) on the search screen. Search sreen looks very
crazy/busy. I was wondering if there are better ways to address these
requirements and build intelligent categorized/configurable searchs?
including allowing user to choose if they want to AND or OR attributes etc?
Any pointers would be appreciated.

thanks,
-- 
View this message in context: 
http://old.nabble.com/Very-busy-search-screen-tp26482092p26482092.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Control DIH from PHP

2009-11-23 Thread Pablo Ferrari
Thankyou

2009/11/21 Lance Norskog 

> Nice! I didn't notice that before. Very useful.
>
> 2009/11/19 Noble Paul നോബിള്‍  नोब्ळ् :
> > you can pass the uniqueId as a param and use it in a sql query
> >
> http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
> .
> > --Noble
> >
> > On Thu, Nov 19, 2009 at 3:53 PM, Pablo Ferrari 
> wrote:
> >> Most specificly, I'm looking to update only one document using it's
> Unique
> >> ID: I dont want the DIH to lookup the whole database because I already
> know
> >> the Unique ID that has changed.
> >>
> >> Pablo
> >>
> >> 2009/11/19 Pablo Ferrari 
> >>
> >>>
> >>>
> >>> Hello!
> >>>
> >>> After been working in Solr documents updates using direct php code
> (using
> >>> SolrClient class) I want to use the DIH (Data Import Handler) to update
> my
> >>> documents.
> >>>
> >>> Any one knows how can I send commands to the DIH from php? Any idea or
> >>> tutorial will be of great help because I'm not finding anything useful
> so
> >>> far.
> >>>
> >>> Thank you for you time!
> >>>
> >>> Pablo
> >>> Tinkerlabs
> >>>
> >>
> >
> >
> >
> > --
> > -
> > Noble Paul | Principal Engineer| AOL | http://aol.com
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: "query" function query; what's it for?

2009-11-23 Thread Yonik Seeley
On Mon, Nov 23, 2009 at 9:36 AM, Smiley, David W.  wrote:
> Thanks Yonik. That blog post was very interesting but it only has about a 
> sentence or two on the query() function

Ah, I had thought that you meant the "query" QParser.
The query function just allows you to use any other query type inside
a function query.
So you could add or multiply the scores of two dismax queries
together, or whatever.

As far as practical usecases... I've used it for selectively boosting
one query based on the results of another query.  I'm sure others will
find other uses for it.

-Yonik
http://www.lucidimagination.com


Re: "query" function query; what's it for?

2009-11-23 Thread Smiley, David W.
Thanks Yonik. That blog post was very interesting but it only has about a 
sentence or two on the query() function, and it points the user to the same 
link I have here at the wiki for examples.  But those examples (which is really 
1 example) doesn't explain the point. -- e.g. when/why would I use this?

On Nov 22, 2009, at 11:11 PM, Yonik Seeley wrote:

> On Sun, Nov 22, 2009 at 11:06 PM, David Smiley @MITRE.org
>  wrote:
>> It's not clear to me what purpose the "query" function query solves.  I've
>> read the description:
>> http://wiki.apache.org/solr/FunctionQuery#query  but it doesn't really
>> explain the point of it. I'm sure it has to do with subtleties in how
>> scoring is done.  Can someone please present a use-case?
> 
> See the "Pure Nested Query" section here:
> http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/
> 
> -Yonik
> http://www.lucidimagination.com



Re: access denied to solr home lib dir

2009-11-23 Thread Charles Moad
Check.  I even verified that the tomcat user could create the
directory (i.e. "sudo -u tomcat6 mkdir /opt/solr/steve/lib").  Still
solr complains.

On Sun, Nov 22, 2009 at 10:03 PM, Yonik Seeley  wrote:
> Maybe ensuring that the full parent path (all parent directories) have
> "rx" permissions?
>
> -Yonik
> http://www.lucidimagination.com
>
> On Sun, Nov 22, 2009 at 2:59 PM, Charles Moad  wrote:
>>    I have been trying to get a new solr install setup on Ubuntu 9.10
>> using tomcat6.  I have tried the solr 1.4 release and the latest svn
>> for good measure.  No matter what, I am running into the following
>> permission error.  I removed all the lib includes from solrconfig.xml.
>> I have created the "/opt/solr/steve/lib" directory and all permissions
>> are good.  This directory is optional, but I just cannot get past
>> this.  I've installed solr 1.3 many times without running into this on
>> redhat boxes.
>>
>> Thanks,
>>    Charlie
>>
>> Nov 22, 2009 2:48:53 PM org.apache.catalina.core.StandardContext filterStart
>> SEVERE: Exception starting filter SolrRequestFilter
>> org.apache.solr.common.SolrException:
>> java.security.AccessControlException: access denied
>> (java.io.FilePermission /opt/solr/steve/./lib read)
>>       at 
>> org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
>>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
>> Method)
>>       at 
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>       at 
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>       at java.lang.Class.newInstance0(Class.java:355)
>>       at java.lang.Class.newInstance(Class.java:308)
>>       at 
>> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:255)
>>       at 
>> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
>>       at 
>> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
>>       at 
>> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
>>       at 
>> org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
>>       at 
>> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
>>       at 
>> org.apache.catalina.core.ContainerBase.access$000(ContainerBase.java:123)
>>       at 
>> org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:145)
>>       at java.security.AccessController.doPrivileged(Native Method)
>>       at 
>> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:769)
>>       at 
>> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
>>       at 
>> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:630)
>>       at 
>> org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:556)
>>       at 
>> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:491)
>>       at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
>>       at 
>> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
>>       at 
>> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
>>       at 
>> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
>>       at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
>>       at 
>> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
>>       at 
>> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
>>       at 
>> org.apache.catalina.core.StandardService.start(StandardService.java:516)
>>       at 
>> org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
>>       at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>       at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>       at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>       at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>       at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>       at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>       at 
>> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:177)
>> Caused by: java.security.AccessControlException: access denied
>> (java.io.FilePermission /opt/solr/steve/./lib read)
>>       at 
>> java.security.AccessControlContex

Re: [N to M] range search out of sum of field. howto search this?

2009-11-23 Thread Yonik Seeley
See frange:
http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/

fq={!frange l=5 u=10}sum(user,num)

-Yonik
http://www.lucidimagination.com



On Mon, Nov 23, 2009 at 8:49 AM, Julian Davchev  wrote:
> Hi folks,
> I got documents like
> user:1   num:5
> user:1   num: 8
> user:5   num:7
> user:5   num:1
> 
>
>
> I'd like to get per user that maches sum of num range 5 to 10
> In this case it should return user 5  as 7+1=8 and is within range.
> User 1 will be false cause sum of num is 5+8=13 hence outside range 5 to 10
>
> Thanks
>


[N to M] range search out of sum of field. howto search this?

2009-11-23 Thread Julian Davchev
Hi folks,
I got documents like
user:1   num:5
user:1   num: 8
user:5   num:7
user:5   num:1



I'd like to get per user that maches sum of num range 5 to 10 
In this case it should return user 5  as 7+1=8 and is within range.
User 1 will be false cause sum of num is 5+8=13 hence outside range 5 to 10

Thanks


Re: Function queries question

2009-11-23 Thread Grant Ingersoll

On Nov 23, 2009, at 6:54 AM, Oliver Beattie wrote:

> Thanks for getting back to me. I've added inline responses below.
> 
> 2009/11/20 Grant Ingersoll 
>> 
>> On Nov 20, 2009, at 3:15 AM, Oliver Beattie wrote:
>> 
>>> Hi all,
>>> 
>>> I'm a relative newcomer to Solr, and I'm trying to use it in a project
>>> of mine. I need to do a function query (I believe) to filter the
>>> results so they are within a certain distance of a point. For this, I
>>> understand I should use something like sqedist or hsin, and from the
>>> documentation on the FunctionQuery page, I believe that the function
>>> is executed on every "row" (or "record", not sure what the proper term
>>> for this is). So, my question is threefold really; are those functions
>>> the ones I should be using to perform a search where distance is one
>>> of the criteria (there are others),
>> 
>> Short answer: yes.  Long answer:  I just committed those functions this 
>> week.  I believe they are good, but feedback is encouraged.
> 
> I'll be sure to let you know if I find anything report-worthy :)
> They're definitely super-useful for people doing similar things to I
> though, so great work :)
> 
>>> and if so, does Solr execute the
>>> query on every row (and again, if so, is there any way of preventing
>>> this [like subqueries, though I know they're not supported])?
>> 
>> You can use the frange capability to filter first.  See 
>> http://www.lucidimagination.com/blog/tag/frange/
> 
> Thanks for the link. I'll definitely do that. Does Solr execute the
> function on every row in the database on every query otherwise?

If the query is unrestricted by other clauses or by filters, yes it will 
execute over all docs in the index.


> 
>> 
>> Here's an example from a soon to be published article I'm writing:
>> http://localhost:8983/solr/select/?q=*:*&fq={!frange l=0 
>> u=400}hsin(0.57, -1.3, lat_rad, lon_rad,  3963.205)
>> 
>> This should filter out all documents that are beyond 400 miles in distance 
>> from that point on a sphere (specified in radians, see also the rads() 
>> method)
>> 
>> 
>> 
>>> 
>>> Sorry if this is a little confusing… any help would be greatly appreciated 
>>> :)

Which part?  The hsin() part calculates the distance between the point 0.57, 
-1.3 and the values in the fields lat_rad, lon_rad and is using 3963.205 as the 
radius of the sphere (which is the approx. radius of the Earth in miles).  The 
frange stuff then filters such that it only accepts docs that have a value for 
hsin between 0 and 400.

-Grant



Re: Boost document base on field length

2009-11-23 Thread Grant Ingersoll

On Nov 23, 2009, at 8:01 AM, Tomasz Kępski wrote:

> Hi,
> 
> I would like to boost documents with longer descriptions to move down 
> documents with 0 length description,
> I'm wondering if there is possibility to boost document basing on the field 
> length while searching or the only way is to store field length as an int in 
> a separate field while indexing?

Override the default Similarity (see the end of the schema.xml file) with your 
own Similarity implementation and then in that class override the lengthNorm() 
method.

ExtractingRequestHandler commitWithin

2009-11-23 Thread j philoon

Any chance of getting the ExtractingRequestHandler to use the commitWithin
parameter?
-- 
View this message in context: 
http://old.nabble.com/ExtractingRequestHandler-commitWithin-tp26478144p26478144.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Index time boosts, payloads, and long query strings

2009-11-23 Thread Erick Erickson
Yep 

On Mon, Nov 23, 2009 at 4:13 AM, Girish Redekar
wrote:

> Thanks Erick!
>
> After reading your answer, and re-reading the Solr wiki, I realized my
> folly. I used to think that index-time boosts when applied on a per-field
> basis are equivalent to query time boosts to that field.
>
> To ensure that my new understanding is correct , I'll state it in my words.
> Index time boosts will determine boost for a *document* if it is counted as
> a hit. Query time boosts give you control on boosting the occurrence of a
> query in a specific field.
>
> Please correct me if I'm wrong (again) :-)
>
> Girish Redekar
> http://girishredekar.net
>
>
> On Sun, Nov 22, 2009 at 8:25 PM, Erick Erickson  >wrote:
>
> > I still think they are apples and oranges. If you boost *all* titles,
> > you're effectively boosting none of them. Index time boosting
> > expresses "this document's title is more important than other
> > document titles." What I think you're after is "titles are more
> > important than other parts of the document.
> >
> > For this latter, you're talking query-time boosting. Boosting only
> > really makes sense if there are multiple clauses, something
> > like title:important OR body:unimportant. If this is true, speed
> > is irrelevant, you need correct behavior.
> >
> > Not that I think you'd notice either way. Modern computers
> > can do a LOT of FLOPS/sec. Here's an experiment: time
> > some queries (but beware of timing the very first ones, see
> > the Wiki) with boosts and without boosts. I doubt you'll see
> > enough difference to matter (but please do report back if you
> > do, it'll further my education ).
> >
> > But, depending on your index structure, you may get this
> > anyway. Generally, matches on shorter fields weigh more
> > in the score calculations than on longer fields. If you have
> > fields like title and body and you are querying on title:term OR
> > body:term, documents with term in the title will tend toward
> > higher scores.
> >
> > But before putting too much effort into this, do you have any
> > evidence that the default behavior is unsatisfactory? Because
> > unless and until you do, I think this is a distraction ...
> >
> > Best
> > Erick
> >
> > On Sun, Nov 22, 2009 at 8:37 AM, Girish Redekar
> > wrote:
> >
> > > Hi Erick -
> > >
> > > Maybe I mis-wrote.
> > >
> > > My question is: would "title:any_query^4.0" be faster/slower than
> > applying
> > > index time boost to the field title. Basically, if I take *every* user
> > > query
> > > and search for it in title with boost (say, 4.0) - is it different than
> > > saying field title has boost 4.0?
> > >
> > > Cheers,
> > > Girish Redekar
> > > http://girishredekar.net
> > >
> > >
> > > On Sun, Nov 22, 2009 at 2:02 AM, Erick Erickson <
> erickerick...@gmail.com
> > > >wrote:
> > >
> > > > I'll take a whack at index .vs. query boosting. They are expressing
> > very
> > > > different concepts. Let's claim we're interested in boosting the
> title
> > > > field
> > > >
> > > > Index time boosting is expressing "this document's title is X more
> > > > important
> > > >
> > > > than a normal document title". It doesn't matter *what* the title is,
> > > > any query that matches on anything in this document's title will give
> > > this
> > > > document a boost. I might use this to give preferential treatment to
> > all
> > > > encyclopedia entries or something.
> > > >
> > > > Query time boosting, like "title:solr^4.0" expresses "Any document
> with
> > > > solr
> > > > in
> > > > it's title is more important than documents without solr in the
> title".
> > > > This
> > > > really
> > > > only makes sense if you have other clauses that might cause a
> document
> > > > *without*
> > > > solr  the title to match..
> > > >
> > > > Since they are doing different things, efficiency isn't really
> > relevant.
> > > >
> > > > HTH
> > > > Erick
> > > >
> > > >
> > > > On Sat, Nov 21, 2009 at 2:13 AM, Girish Redekar
> > > > wrote:
> > > >
> > > > > Hi ,
> > > > >
> > > > > I'm relatively new to Solr/Lucene, and am using Solr (and not
> lucene
> > > > > directly) primarily because I can use it without writing java code
> > > (rest
> > > > of
> > > > > my project is python coded).
> > > > >
> > > > > My application has the following requirements:
> > > > > (a) ability to search over multiple fields, each with different
> > weight
> > > > > (b) If possible, I'd like to have the ability to add
> extra/diminished
> > > > > weights to particular tokens within a field
> > > > > (c) My query strings have large lengths (50-100 words)
> > > > > (d) My index is 500K+  documents
> > > > >
> > > > > 1) The way to (a) is field boosting (right?). My question is: Is
> all
> > > > field
> > > > > boosting done at query time? Even if I give index time boosts to
> > > fields?
> > > > Is
> > > > > there a performance advantage in boosting fields at index time vs
> at
> > > > using
> > > > > something like fieldname:querystring^boost.
> > > > > 2) From

Boost document base on field length

2009-11-23 Thread Tomasz Kępski

Hi,

I would like to boost documents with longer descriptions to move down 
documents with 0 length description,
I'm wondering if there is possibility to boost document basing on the 
field length while searching or the only way is to store field length as 
an int in a separate field while indexing?


Tom


Re: Function queries question

2009-11-23 Thread Oliver Beattie
Thanks for getting back to me. I've added inline responses below.

2009/11/20 Grant Ingersoll 
>
> On Nov 20, 2009, at 3:15 AM, Oliver Beattie wrote:
>
> > Hi all,
> >
> > I'm a relative newcomer to Solr, and I'm trying to use it in a project
> > of mine. I need to do a function query (I believe) to filter the
> > results so they are within a certain distance of a point. For this, I
> > understand I should use something like sqedist or hsin, and from the
> > documentation on the FunctionQuery page, I believe that the function
> > is executed on every "row" (or "record", not sure what the proper term
> > for this is). So, my question is threefold really; are those functions
> > the ones I should be using to perform a search where distance is one
> > of the criteria (there are others),
>
> Short answer: yes.  Long answer:  I just committed those functions this week. 
>  I believe they are good, but feedback is encouraged.

I'll be sure to let you know if I find anything report-worthy :)
They're definitely super-useful for people doing similar things to I
though, so great work :)

> > and if so, does Solr execute the
> > query on every row (and again, if so, is there any way of preventing
> > this [like subqueries, though I know they're not supported])?
>
> You can use the frange capability to filter first.  See 
> http://www.lucidimagination.com/blog/tag/frange/

Thanks for the link. I'll definitely do that. Does Solr execute the
function on every row in the database on every query otherwise?

>
> Here's an example from a soon to be published article I'm writing:
> http://localhost:8983/solr/select/?q=*:*&fq={!frange l=0 u=400}hsin(0.57, 
> -1.3, lat_rad, lon_rad,  3963.205)
>
> This should filter out all documents that are beyond 400 miles in distance 
> from that point on a sphere (specified in radians, see also the rads() method)
>
>
>
> >
> > Sorry if this is a little confusing… any help would be greatly appreciated 
> > :)
>
> No worries, a lot of this spatial stuff is still being ironed out.  See 
> https://issues.apache.org/jira/browse/SOLR-773 for the issue that is tracking 
> all of the related issues.  The pieces are starting to come together and I'm 
> pretty excited about it b/c not only will it bring native spatial support to 
> Solr, it will also give Solr some exciting new general capabilities (sort by 
> function, pseudo-fields, facet by function, etc.)


Output all, from one field

2009-11-23 Thread Jörg Agatz
Hallo,

I search for a way, to output all content from one field..

Like name:

"NAME:*"

And Solr gifs me all Names

or "color:*"

and i become all colors

can io do this? or is this Impossible?

Jörg


Re: Huge load and long response times during search

2009-11-23 Thread Andrey Klochkov
Tom,

AFAIK Lucene performance is very much dependent on file system cache size,
in case of large index. So if you see lots of IO, this probably means that
your system doesn't have enough memory to hold large file system cache,
suitable for your index size. In this case you don't need to give more
memory to java processes, but instead you need to free as much memory as you
can for the OS.

On Mon, Nov 23, 2009 at 11:53 AM, Tomasz Kępski  wrote:

> Hi,
>
> Otis Gospodnetic pisze:
>
>  Tom,
>>
>> It looks like the machine might simply be running too many things.
>>
> > If the load is around 1 when Solr is not running, and this is a dual-core
> server, it shows its already relatively busy (cca 50% idle).
>
> The server is running the Postgresql and Apache/PHP as well, but without
> solr the server condition is more than good (load usually less than 1,
> sometimes , even dring rush hours we observed 1m load avg 0,68).
>
> It is double dual core so load 1 means 25% am I right (4 cores)?
>
>
>  Your caches are not small, so I am guessing you either have to have a
>> relatively big heap, or your heap is not large enough and it's the GC that's
>> causing high CPU load.
>>
>
> The java starts with Xmx3584m. Should that be fine for such cache settings?
> By the way I'm wondering if we need such caches. I did check query frequency
> for last 10 days (~7 unique users) and most frequent phrase appears ~150
> times, and only 11 queries exists more than 100 times. I did not count if
> user used the same query but goes to next page.
>
> Is this worthy to keep quite big cache in this cas?
>
>
>  If you are seeing Solr causing lots of IO, that's a sign the box doesn't
>> have enough memory for all those servers running comfortably on it.
>>
>
> We do have some free memory to use. Server has 8G RAM and mostly uses up to
> 6G, I haven't seen the swap used yet. I would try to give more RAM for java
> and use smaller cache to see if it would work.
>
> Tom
>
>
>


-- 
Andrew Klochkov
Senior Software Engineer,
Grid Dynamics


RE: schema-based Index-time field boosting

2009-11-23 Thread Ian Smith
Yeah, like I said, I was mistaken about setting field boost in
schema.xml - doesn't mean it's a bad idea though.  At any rate, from
your penultimate sentence I reckon at least one of us is still confused
about field boosting, feel free to reply if you think it's me ;)

Ian.

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: 21 November 2009 01:54
To: solr-user@lucene.apache.org
Subject: RE: schema-based Index-time field boosting 


: The field boost attribute was put there by me back in the 1.3 days,
when
: I somehow gained the mistaken impression that it was supposed to work!
: Of course, despite a lot of searching I haven't been able to find
: anything to back up my position ;)

solr has never supported anything like a "boost" paramter on fields in
schema.xml 


: Of course, by now I am convinced that this might be a really good
: feature - I might get the chance to look into it in the near future -
: can anyone think of reasons why this might not work in practice?

field boosting only makes sense if it's only applied to some of the
documents in the index, if every document has an index time boost on
fieldX, then that boost is meaningless.

are you looking for query time boosting on fields?  like what dismax
provides with the "qf" param?



-Hoss


Web design and intelligent Content Management. www.twitter.com/gossinteractive 

Registered Office: c/o Bishop Fleming, Cobourg House, Mayflower Street, 
Plymouth, PL1 1LG.  Company Registration No: 3553908 

This email contains proprietary information, some or all of which may be 
legally privileged. It is for the intended recipient only. If an addressing or 
transmission error has misdirected this email, please notify the author by 
replying to this email. If you are not the intended recipient you may not use, 
disclose, distribute, copy, print or rely on this email. 

Email transmission cannot be guaranteed to be secure or error free, as 
information may be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete or contain viruses. This email and any files attached to it have 
been checked with virus detection software before transmission. You should 
nonetheless carry out your own virus check before opening any attachment. GOSS 
Interactive Ltd accepts no liability for any loss or damage that may be caused 
by software viruses.




Re: Index time boosts, payloads, and long query strings

2009-11-23 Thread Girish Redekar
Thanks Erick!

After reading your answer, and re-reading the Solr wiki, I realized my
folly. I used to think that index-time boosts when applied on a per-field
basis are equivalent to query time boosts to that field.

To ensure that my new understanding is correct , I'll state it in my words.
Index time boosts will determine boost for a *document* if it is counted as
a hit. Query time boosts give you control on boosting the occurrence of a
query in a specific field.

Please correct me if I'm wrong (again) :-)

Girish Redekar
http://girishredekar.net


On Sun, Nov 22, 2009 at 8:25 PM, Erick Erickson wrote:

> I still think they are apples and oranges. If you boost *all* titles,
> you're effectively boosting none of them. Index time boosting
> expresses "this document's title is more important than other
> document titles." What I think you're after is "titles are more
> important than other parts of the document.
>
> For this latter, you're talking query-time boosting. Boosting only
> really makes sense if there are multiple clauses, something
> like title:important OR body:unimportant. If this is true, speed
> is irrelevant, you need correct behavior.
>
> Not that I think you'd notice either way. Modern computers
> can do a LOT of FLOPS/sec. Here's an experiment: time
> some queries (but beware of timing the very first ones, see
> the Wiki) with boosts and without boosts. I doubt you'll see
> enough difference to matter (but please do report back if you
> do, it'll further my education ).
>
> But, depending on your index structure, you may get this
> anyway. Generally, matches on shorter fields weigh more
> in the score calculations than on longer fields. If you have
> fields like title and body and you are querying on title:term OR
> body:term, documents with term in the title will tend toward
> higher scores.
>
> But before putting too much effort into this, do you have any
> evidence that the default behavior is unsatisfactory? Because
> unless and until you do, I think this is a distraction ...
>
> Best
> Erick
>
> On Sun, Nov 22, 2009 at 8:37 AM, Girish Redekar
> wrote:
>
> > Hi Erick -
> >
> > Maybe I mis-wrote.
> >
> > My question is: would "title:any_query^4.0" be faster/slower than
> applying
> > index time boost to the field title. Basically, if I take *every* user
> > query
> > and search for it in title with boost (say, 4.0) - is it different than
> > saying field title has boost 4.0?
> >
> > Cheers,
> > Girish Redekar
> > http://girishredekar.net
> >
> >
> > On Sun, Nov 22, 2009 at 2:02 AM, Erick Erickson  > >wrote:
> >
> > > I'll take a whack at index .vs. query boosting. They are expressing
> very
> > > different concepts. Let's claim we're interested in boosting the title
> > > field
> > >
> > > Index time boosting is expressing "this document's title is X more
> > > important
> > >
> > > than a normal document title". It doesn't matter *what* the title is,
> > > any query that matches on anything in this document's title will give
> > this
> > > document a boost. I might use this to give preferential treatment to
> all
> > > encyclopedia entries or something.
> > >
> > > Query time boosting, like "title:solr^4.0" expresses "Any document with
> > > solr
> > > in
> > > it's title is more important than documents without solr in the title".
> > > This
> > > really
> > > only makes sense if you have other clauses that might cause a document
> > > *without*
> > > solr  the title to match..
> > >
> > > Since they are doing different things, efficiency isn't really
> relevant.
> > >
> > > HTH
> > > Erick
> > >
> > >
> > > On Sat, Nov 21, 2009 at 2:13 AM, Girish Redekar
> > > wrote:
> > >
> > > > Hi ,
> > > >
> > > > I'm relatively new to Solr/Lucene, and am using Solr (and not lucene
> > > > directly) primarily because I can use it without writing java code
> > (rest
> > > of
> > > > my project is python coded).
> > > >
> > > > My application has the following requirements:
> > > > (a) ability to search over multiple fields, each with different
> weight
> > > > (b) If possible, I'd like to have the ability to add extra/diminished
> > > > weights to particular tokens within a field
> > > > (c) My query strings have large lengths (50-100 words)
> > > > (d) My index is 500K+  documents
> > > >
> > > > 1) The way to (a) is field boosting (right?). My question is: Is all
> > > field
> > > > boosting done at query time? Even if I give index time boosts to
> > fields?
> > > Is
> > > > there a performance advantage in boosting fields at index time vs at
> > > using
> > > > something like fieldname:querystring^boost.
> > > > 2) From what I've read, it seems that I can do (b) using payloads.
> > > However,
> > > > as this link (
> > > >
> > > >
> > >
> >
> http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
> > > > )
> > > > suggests, I will have to write a payload aware Query Parser. Wanted
> to
> > > > confirm if this is indeed the case - or is there a out-of-box way to

Re: Huge load and long response times during search

2009-11-23 Thread Tomasz Kępski

Hi,

Otis Gospodnetic pisze:

Tom,

It looks like the machine might simply be running too many things. 
> If the load is around 1 when Solr is not running, and this is a 
dual-core server, it shows its already relatively busy (cca 50% idle).


The server is running the Postgresql and Apache/PHP as well, but without 
solr the server condition is more than good (load usually less than 1, 
sometimes , even dring rush hours we observed 1m load avg 0,68).


It is double dual core so load 1 means 25% am I right (4 cores)?

Your caches are not small, so I am guessing you either have to have a relatively big heap, or your heap is not large enough and it's the GC that's causing high CPU load.  


The java starts with Xmx3584m. Should that be fine for such cache 
settings? By the way I'm wondering if we need such caches. I did check 
query frequency for last 10 days (~7 unique users) and most frequent 
phrase appears ~150 times, and only 11 queries exists more than 100 
times. I did not count if user used the same query but goes to next page.


Is this worthy to keep quite big cache in this cas?


If you are seeing Solr causing lots of IO, that's a sign the box doesn't have 
enough memory for all those servers running comfortably on it.


We do have some free memory to use. Server has 8G RAM and mostly uses up 
to 6G, I haven't seen the swap used yet. I would try to give more RAM 
for java and use smaller cache to see if it would work.


Tom