Re: WordDelimiterFilter to QueryParser to MultiPhraseQuery?

2009-09-03 Thread Shalin Shekhar Mangar
On Mon, Aug 31, 2009 at 10:47 PM, jOhn net...@gmail.com wrote:

 This is mostly my misunderstanding of catenateAll=1 as I thought it would
 break down with an OR using the full concatenated word.

 Thus:

 Jokers Wild - { jokers, wild } OR { jokerswild }

 But really it becomes: { jokers, {wild, jokerswild}} which will not match.

 And if you have a mistyped camel case like:

 jOkerswild - { j, {okerswild, jokerswild}} again no match.


Sorry for the late reply. You still haven't given the fieldtype definition
that you were using.

I tried:

fieldtype name=wdf_preserve_catenate class=solr.TextField
  analyzer
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
preserveOriginal=1/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldtype

And I tried indexing Jokers Wild which matches when I query for
jOkerswild and jokerswild. Note that if you change the tokenizer to
WhiteSpaceTokenizer then such queries won't match.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Problem querying for a value with a space

2009-09-03 Thread Shalin Shekhar Mangar
On Thu, Sep 3, 2009 at 1:45 AM, Adam Allgaier allgai...@yahoo.com wrote:


 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/
 ...
 dynamicField name=*_s  type=string  indexed=true  stored=true/

 I am indexing the specific_LIST_s with the value For Sale.
 The document indexes just fine.  A query returns the document with the
 proper value:
str name=specific_LIST_sFor Sale/str

 However, when I try to query on that field
+specific_LIST_s:For Sale
+specific_LIST_s:For+Sale
+specific_LIST_s:For%20Sale

 I get no results with any one of those three queries.


Use +specific_LIST_s:(For Sale)
or
+specific_LIST_s:For Sale

-- 
Regards,
Shalin Shekhar Mangar.


Re: Return 2 fields per facet.. name and id, for example? / facet value search

2009-09-03 Thread Shalin Shekhar Mangar
On Fri, Aug 28, 2009 at 12:57 AM, Rihaed Tan tanrihae...@gmail.com wrote:

 Hi,

 I have a similar requirement to Matthew (from his post 2 years ago). Is
 this
 still the way to go in storing both the ID and name/value for facet values?
 I'm planning to use id#name format if this is still the case and doing a
 prefix query. I believe this is a common requirement so I'd appreciate if
 any of you guys can share what's the best way to do it.

 Also, I'm indexing the facet values for text search as well. Should the
 field declaration below suffice the requirement?

 field name=category type=text indexed=true stored=true
 required=true multiValued=true/


There have been talks of having a pair field type in Solr but there is no
patch yet. So I guess the way proposed by Yonik is a good solution.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-03 Thread Uri Boness
The development on this patch is quite active. It works well for single 
solr instance, but distributed search (ie. shards) is not yet supported. 
Using this page you can group search results based on a specific field. 
There are two flavors of field collapsing - adjacent and non-adjacent, 
the former collapses only document which happen to be located next to 
each other in the otherwise-non-collapsed results set. The later (the 
non-adjacent) one collapses all documents with the same field value 
(regardless of their position in the otherwise-non-collapsed results 
set). Note, that non-adjacent performs better than adjacent one. There's 
currently discussion to extend this support so in addition to collapsing 
the documents, extra information will be returned for the collapsed 
documents (see the discussion on the issue page).


Uri

R. Tan wrote:

I think this is what I'm looking for. What is the status of this patch?

On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com wrote:

  

Hi Solrers,
I would like to get your opinion on how to best approach a search
requirement that I have. The scenario is I have a set of business listings
that may be group into one parent business (such as 7-eleven having several
locations). On the results page, I only want 7-eleven to show up once but
also show how many locations matched the query (facet filtered by state, for
example) and maybe a preview of the some of the locations.

Searching for the business name is straightforward but the locations within
the a result is quite tricky. I can do the opposite, searching for the
locations and faceting on business names, but it will still basically be the
same thing and repeat results with the same business name.

Any advice?

Thanks,
R




  


Exact Word Search

2009-09-03 Thread bhaskar chandrasekar
Hi,
 
Can any one help me with the below scenario?.
 
Scenario :
 
I have integrated Solr with Carrot2.
The issue is 
Assuming i give bhaskar as input string for search.
It should give me search results pertaining to bhaskar only.
 Example: It should not display search results as chandarbhaskar or
 bhaskarc.
 Basically search should happen based on the exact word match. I am not 
bothered about case sensitive here
 How to achieve the above Scenario in Carrot2 ?.
 
Regards
Bhaskar
 


  

Solr question

2009-09-03 Thread SEZNEC Bruno
Hi,
 
Following solr tuto,
I send doc to solr by request :
curl
'http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=attr_map.
content=attr_contentcommit=true' --F myfi...@oxiane.pdf
response
lst name=responseHeaderint name=status0/intint
name=QTime23717/int/lst
/response

Reply seems OK, content is in the index,
but after no query match the doc...
 
TIA
Regards
Bruno
 


Re: questions about solr

2009-09-03 Thread Shalin Shekhar Mangar
On Wed, Sep 2, 2009 at 10:44 PM, Zhenyu Zhong zhongresea...@gmail.comwrote:

 Dear all,

 I am very interested in Solr and would like to deploy Solr for distributed
 indexing and searching. I hope you are the right Solr expert who can help
 me
 out.
 However, I have concerns about the scalability and management overhead of
 Solr. I am wondering if anyone could give me some guidance on Solr.

 Basically, I have the following questions,
 For indexing
 1.  How does Solr handle the distributed indexing? It seems Solr generates
 index on a single box. What if the index is huge and can't sit on one box?


Solr leaves the distribution of index upto the user. So if you think your
index will not fit in one box, you figure out a sharding strategy (such as
hashing or round-robin) and index your collection into each shards.

Solr supports distributed search so that your query can use all the shards
to give you the results.


 2.  Is it possible for Solr to generate index in HDFS?


Never tried but it seems so. See Jason's response and the Jira issue he has
mentioned.


 For searching
 3.  Solr provides Master/Slave framework. How does the Solr distribute the
 search? Does Solr know which index/shard to deliver the query to? Or does
 it
 have to do a multicast query to all the nodes?


For a full-text search it is hard to figure out the correct shards because
matching document could be living anywhere (unless you shard in a very
clever way and your data can be sharded in that way). Each shard is queried,
the results are merged and returned as if you had queried a single Solr
server.


 For fault tolerance
 4. Does Solr handle the management overhead automatically? suppose master
 goes down, how does Solr recover the master in order to get the latest
 index
 updates?

   Do we have to code ourselves to handle this?


It does not. You have to handle that yourself currently. Similar topics have
been discussed on this list in the past and some workarounds have been
suggested. I suggest you search the archives.


 5. Suppose master goes down immediately after the index updates, while the
 updates haven't been replicated to the slaves, data loss seems to happen.
 Does Solr have any mechanism to deal with that?


No. If you want you can setup a backup master and index on both master and
backup machines to achieve redundancy. However switching between the master
and the backup would need to be done by you.


 Performance of real-time index updating
 6. How is the performance of this realtime index updating? Suppose we are
 updating a million records for a huge index with billions of records
 frequently. Can Solr provides a reasonable performance and low latency on
 that? (Probably it is related to Lucene library)


How frequently? With careful sharding, you can distribute your write load.
Depending on your data, you may also be able to split you indexes into a
more frequently updated on and an older archive index.

A lot of work is in progress in this area. Lucene 2.9 has support for near
real time search with more improvements planned in the coming days. Solr 1.4
will not have support for these new Lucene features but with 1.5 things
should be a lot better.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-03 Thread R. Tan
Thanks Uri. How does paging and scoring work when using field collapsing?
What patch works with 1.3? Is it production ready?

R


On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote:

 The development on this patch is quite active. It works well for single
 solr instance, but distributed search (ie. shards) is not yet supported.
 Using this page you can group search results based on a specific field.
 There are two flavors of field collapsing - adjacent and non-adjacent, the
 former collapses only document which happen to be located next to each other
 in the otherwise-non-collapsed results set. The later (the non-adjacent) one
 collapses all documents with the same field value (regardless of their
 position in the otherwise-non-collapsed results set). Note, that
 non-adjacent performs better than adjacent one. There's currently discussion
 to extend this support so in addition to collapsing the documents, extra
 information will be returned for the collapsed documents (see the discussion
 on the issue page).

 Uri


 R. Tan wrote:

 I think this is what I'm looking for. What is the status of this patch?

 On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com wrote:



 Hi Solrers,
 I would like to get your opinion on how to best approach a search
 requirement that I have. The scenario is I have a set of business
 listings
 that may be group into one parent business (such as 7-eleven having
 several
 locations). On the results page, I only want 7-eleven to show up once but
 also show how many locations matched the query (facet filtered by state,
 for
 example) and maybe a preview of the some of the locations.

 Searching for the business name is straightforward but the locations
 within
 the a result is quite tricky. I can do the opposite, searching for the
 locations and faceting on business names, but it will still basically be
 the
 same thing and repeat results with the same business name.

 Any advice?

 Thanks,
 R









Question: How do I run the solr analysis tool programtically ?

2009-09-03 Thread Yatir

Form java code I want to contact solr through Http and supply a text buffer
(or a url that returns text, whatever is easier) and I want to get in return
the final list of tokens (or the final text buffer) after it went through
all the query time filters defined for this solr instance (stemming, stop
words etc)
thanks in advance

-- 
View this message in context: 
http://www.nabble.com/Question%3A-How-do-I-run-the-solr-analysis-tool-programtically---tp25273484p25273484.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question: How do I run the solr analysis tool programtically ?

2009-09-03 Thread Chris Male
Hi Yatir,

The FieldAnalysisRequestHandler has the same behavior as the analysis tool.
It will show you the list of tokens that are created after each of the
filters have been applied.  It can be used through normal HTTP requests, or
you can use SolrJ's support.

Thanks,
Chris

On Thu, Sep 3, 2009 at 12:42 PM, Yatir yat...@outbrain.com wrote:


 Form java code I want to contact solr through Http and supply a text buffer
 (or a url that returns text, whatever is easier) and I want to get in
 return
 the final list of tokens (or the final text buffer) after it went through
 all the query time filters defined for this solr instance (stemming, stop
 words etc)
 thanks in advance

 --
 View this message in context:
 http://www.nabble.com/Question%3A-How-do-I-run-the-solr-analysis-tool-programtically---tp25273484p25273484.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr question

2009-09-03 Thread Erik Hatcher


On Sep 3, 2009, at 1:24 AM, SEZNEC Bruno wrote:


Hi,

Following solr tuto,
I send doc to solr by request :
curl
'http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=attr_map 
.

content=attr_contentcommit=true' --F myfi...@oxiane.pdf
response
lst name=responseHeaderint name=status0/intint
name=QTime23717/int/lst
/response

Reply seems OK, content is in the index,
but after no query match the doc...


Not even a *:* query?  What queries are you trying?  What's your  
default search field?  What does the query parse to, as seen in the  
response using debugQuery=true ?   Likely the problem is that you  
aren't searching on the field the content was indexed into, or that it  
was not analyzed as you need.


Erik



Re: Exact Word Search

2009-09-03 Thread Shalin Shekhar Mangar
On Thu, Sep 3, 2009 at 1:33 PM, bhaskar chandrasekar
bas_s...@yahoo.co.inwrote:

 Hi,

 Can any one help me with the below scenario?.

 Scenario :

 I have integrated Solr with Carrot2.
 The issue is
 Assuming i give bhaskar as input string for search.
 It should give me search results pertaining to bhaskar only.
  Example: It should not display search results as chandarbhaskar or
  bhaskarc.
  Basically search should happen based on the exact word match. I am not
 bothered about case sensitive here
  How to achieve the above Scenario in Carrot2 ?.


Bhaskar, I think this question is better suited for the Carrot mailing
lists. Unless you yourself control how the solr query is created, we will
not be able to help you.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Using SolrJ with Tika

2009-09-03 Thread Abdullah Shaikh
Hi Laurent,

I am not sure if this is what you need, but you can extract the content from
the uploaded document (MS Docs, PDF etc) using TIKA and then send it to SOLR
for indexing.

String CONTENT = extract the content using TIKA (you can use
AutoDetectParser)

and then,

SolrInputDocument doc = new SolrInputDocument();
doc.addField(DOC_CONTENT, CONTENT);

solrServer.add(doc);
soltServer.commit();


On Wed, Sep 2, 2009 at 5:26 PM, Angel Ice lbil...@yahoo.fr wrote:

 Hi everybody.

 I hope it's the right place for questions, if not sorry.

 I'm trying to index rich documents (PDF, MS docs etc) in SolR/Lucene.
 I have seen a few examples explaining how to use tika to solve this. But
 most of these examples are using curl to send documents to Solr or an HTML
 POST with an input file.
 But i'd like to do it in full java.
 Is there a way to use Solrj to index the documents with the
 ExtractingRequestHandler of SolR or at least to get the extracted xml back
 (with the extract.only option) ?

 Many thanks.

 Laurent.






Indexing docs using TIKA

2009-09-03 Thread Abdullah Shaikh
I am not sure if this went to Mailing List before.. hence forwarding again

Hi All,

I want to search for a document containing string to search, price between
100 to 200 and weight 10-20.

SolrQuery query = new SolrQuery();
query.setQuery( DOC_CONTENT: string to search);

query.setFilterQueries(PRICE:[100 TO 200]);
query.setFilterQueries(WEIGHT:[10 TO 20]);

QueryResponse response = server.query(query);

The DOC_CONTENT contains the content extracted from the file uploaded by the
user, extracted using TIKA.

Is the above approach correct ?


Re : Using SolrJ with Tika

2009-09-03 Thread Angel Ice
Hi

This is the solution I was testing.
I got some difficulties with AutoDetectParser but I think it's the solution I 
will use in the end.


Thanks for the advice anyway :)

Regards,

Laurent





De : Abdullah Shaikh abdullah.sha...@viithiisys.com
À : solr-user@lucene.apache.org
Envoyé le : Jeudi, 3 Septembre 2009, 14h31mn 10s
Objet : Re: Using SolrJ with Tika

Hi Laurent,

I am not sure if this is what you need, but you can extract the content from
the uploaded document (MS Docs, PDF etc) using TIKA and then send it to SOLR
for indexing.

String CONTENT = extract the content using TIKA (you can use
AutoDetectParser)

and then,

SolrInputDocument doc = new SolrInputDocument();
doc.addField(DOC_CONTENT, CONTENT);

solrServer.add(doc);
soltServer.commit();


On Wed, Sep 2, 2009 at 5:26 PM, Angel Ice lbil...@yahoo.fr wrote:

 Hi everybody.

 I hope it's the right place for questions, if not sorry.

 I'm trying to index rich documents (PDF, MS docs etc) in SolR/Lucene.
 I have seen a few examples explaining how to use tika to solve this. But
 most of these examples are using curl to send documents to Solr or an HTML
 POST with an input file.
 But i'd like to do it in full java.
 Is there a way to use Solrj to index the documents with the
 ExtractingRequestHandler of SolR or at least to get the extracted xml back
 (with the extract.only option) ?

 Many thanks.

 Laurent.







  

RE: Solr question

2009-09-03 Thread SEZNEC Bruno

 Thanks
My idea was that is I have 
dynamicField name=attr_* type=textgen indexed=true stored=true
multiValued=true/
in schema.xml
Eveything was stored in the index.
The query solr or other stuff works well only with text given in the sample
files
Rgds
Bruno


 -Message d'origine-
 De : Erik Hatcher [mailto:erik.hatc...@gmail.com] 
 Envoyé : jeudi 3 septembre 2009 13:40
 À : solr-user@lucene.apache.org
 Objet : Re: Solr question
 
 
 On Sep 3, 2009, at 1:24 AM, SEZNEC Bruno wrote:
 
  Hi,
 
  Following solr tuto,
  I send doc to solr by request :
  curl
  
 'http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=att
  r_map
  .
  content=attr_contentcommit=true' --F myfi...@oxiane.pdf
  response
  lst name=responseHeaderint name=status0/intint 
  name=QTime23717/int/lst /response
 
  Reply seems OK, content is in the index, but after no query 
 match the 
  doc...
 
 Not even a *:* query?  What queries are you trying?  What's 
 your default search field?  What does the query parse to, as 
 seen in the  
 response using debugQuery=true ?   Likely the problem is that you  
 aren't searching on the field the content was indexed into, 
 or that it was not analyzed as you need.
 
   Erik
 
 


Re: score = sum of boosts

2009-09-03 Thread Walter Underwood
You could start with a TF formula that ignores frequencies above 1.  
onOffTF, I guess, returning 1 if the term is there one or more times.


Or, you could tell us what you are trying to achieve.

wunder

On Sep 3, 2009, at 12:28 AM, Shalin Shekhar Mangar wrote:

On Thu, Sep 3, 2009 at 4:09 AM, Joe Calderon  
calderon@gmail.com wrote:



hello *, what would be the best approach to return the sum of boosts
as the score?

ex:
a dismax handler boosts matches to field1^100 and field2^50, a query
matches both fields hence the score for that row would be 150


Not really. The tf-idf score would be multiplied by 100 for field1  
and by 50

for field2. The score can be more than 150 if both fields match.




is this something i could do with a function query or do i need to
hack up DisjunctionMaxScorer ?


Can you give a little more background on what you want to achieve  
this way?


--
Regards,
Shalin Shekhar Mangar.




Best way to do a lucene matchAllDocs not using q.alt=*:*

2009-09-03 Thread Marc Sturlese

Hey there,
I need a query to get the total number of documents in my index. I can get
if I do this using DismaxRequestHandler:
q.alt=*:*facet=falsehl=falserows=0
I have noticed this query is very memory consuming. Is there any more
optimized way in trunk to get the total number of documents of my index?
Thanks in advanced

-- 
View this message in context: 
http://www.nabble.com/Best-way-to-do-a-lucene-matchAllDocs-not-using-q.alt%3D*%3A*-tp25277585p25277585.html
Sent from the Solr - User mailing list archive at Nabble.com.



Default Query Type For Facet Queries

2009-09-03 Thread Stephen Duncan Jr
We have a custom query parser plugin registered as the default for searches,
and we'd like to have the same parser used for facet.query.

Is there a way to register it as the default for FacetComponent in
solrconfig.xml?

I know I can add {!type=customparser} to each query as a workaround, but I'd
rather register it in the config that make my code send that and strip it
off on every facet query.

-- 
Stephen Duncan Jr
www.stephenduncanjr.com


RE: Solr question

2009-09-03 Thread SEZNEC Bruno
Response with id:doc4 is OK

response
−
lst name=responseHeader
int name=status0/int
int name=QTime3/int
−
lst name=params
str name=indenton/str
str name=start0/str
str name=qid:doc4/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
−
result name=response numFound=1 start=0
−
doc
−
arr name=attr_Author
strSami Siren/str
/arr
−
arr name=attr_Content-Type
strapplication/pdf/str
/arr
−
arr name=attr_content
−
str
   Example PDF document Tika Solr Cell
This is a sample piece of content for Tika Solr Cell article.
/str
/arr
−
arr name=attr_created
strWed Dec 31 10:17:13 CET 2008/str
/arr
−
arr name=attr_creator
strWriter/str
/arr
−
arr name=attr_producer
strOpenOffice.org 3.0/str
/arr
−
arr name=attr_stream_content_type
strapplication/octet-stream/str
/arr
−
arr name=attr_stream_name
strSampleDocument.pdf/str
/arr
−
arr name=attr_stream_size
str18408/str
/arr
−
arr name=attr_stream_source_info
strmyfile/str
/arr
str name=iddoc4/str
str name=titleExample PDF document/str
/doc
/result
/response

What I don't understand is why a simple search on title or content
Doesn't works
:
response
−
lst name=responseHeader
int name=status0/int
int name=QTime3/int
−
lst name=params
str name=indenton/str
str name=start0/str
str name=qPDF/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
result name=response numFound=0 start=0/
/response

Thanks 

 -Message d'origine-
 De : Erik Hatcher [mailto:erik.hatc...@gmail.com] 
 Envoyé : jeudi 3 septembre 2009 13:40
 À : solr-user@lucene.apache.org
 Objet : Re: Solr question
 
 
 On Sep 3, 2009, at 1:24 AM, SEZNEC Bruno wrote:
 
  Hi,
 
  Following solr tuto,
  I send doc to solr by request :
  curl
  
 'http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=att
  r_map
  .
  content=attr_contentcommit=true' --F myfi...@oxiane.pdf
  response
  lst name=responseHeaderint name=status0/intint 
  name=QTime23717/int/lst /response
 
  Reply seems OK, content is in the index, but after no query 
 match the 
  doc...
 
 Not even a *:* query?  What queries are you trying?  What's 
 your default search field?  What does the query parse to, as 
 seen in the  
 response using debugQuery=true ?   Likely the problem is that you  
 aren't searching on the field the content was indexed into, 
 or that it was not analyzed as you need.
 
   Erik
 
 


how to scan dynamic field without specifying each field in query

2009-09-03 Thread gdeconto

say I have a dynamic field called Foo* (where * can be in the hundreds) and
want to search Foo* for a value of 3 (for example)

I know I can do this via this:

http://localhost:8994/solr/select?q=(Foo1:3 OR Foo2:3 OR Foo3:3 OR …
Foo999:3)

However, is there a better way?  i.e. is there some way to query by a
function I create, possibly something like this:

http://localhost:8994/solr/select?q=myfunction(‘Foo’, 3)

where myfunction itself iterates thru all the instances of Foo*

any help appreciated

-- 
View this message in context: 
http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25280228.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: how to scan dynamic field without specifying each field in query

2009-09-03 Thread Manepalli, Kalyan
You can copy the dynamic fields value into a different field and query on that 
field.

Thanks,
Kalyan Manepalli

-Original Message-
From: gdeconto [mailto:gerald.deco...@topproducer.com] 
Sent: Thursday, September 03, 2009 12:06 PM
To: solr-user@lucene.apache.org
Subject: how to scan dynamic field without specifying each field in query


say I have a dynamic field called Foo* (where * can be in the hundreds) and
want to search Foo* for a value of 3 (for example)

I know I can do this via this:

http://localhost:8994/solr/select?q=(Foo1:3 OR Foo2:3 OR Foo3:3 OR ...
Foo999:3)

However, is there a better way?  i.e. is there some way to query by a
function I create, possibly something like this:

http://localhost:8994/solr/select?q=myfunction('Foo', 3)

where myfunction itself iterates thru all the instances of Foo*

any help appreciated

-- 
View this message in context: 
http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25280228.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: how to scan dynamic field without specifying each field in query

2009-09-03 Thread Avlesh Singh

 I know I can do this via this: http://localhost:8994/solr/select?q=(Foo1:3OR 
 Foo2:3 OR Foo3:3 OR ... Foo999:3)

Careful! You may hit the upper limit for MAX_BOOLEAN_CLAUSES this way.


 You can copy the dynamic fields value into a different field and query on
 that field.

Good idea!

Cheers
**Avlesh

On Thu, Sep 3, 2009 at 10:47 PM, Manepalli, Kalyan 
kalyan.manepa...@orbitz.com wrote:

 You can copy the dynamic fields value into a different field and query on
 that field.

 Thanks,
 Kalyan Manepalli

 -Original Message-
 From: gdeconto [mailto:gerald.deco...@topproducer.com]
 Sent: Thursday, September 03, 2009 12:06 PM
 To: solr-user@lucene.apache.org
 Subject: how to scan dynamic field without specifying each field in query


 say I have a dynamic field called Foo* (where * can be in the hundreds) and
 want to search Foo* for a value of 3 (for example)

 I know I can do this via this:

 http://localhost:8994/solr/select?q=(Foo1:3http://localhost:8994/solr/select?q=%28Foo1:3OR
  Foo2:3 OR Foo3:3 OR ...
 Foo999:3)

 However, is there a better way?  i.e. is there some way to query by a
 function I create, possibly something like this:

 http://localhost:8994/solr/select?q=myfunction('Foohttp://localhost:8994/solr/select?q=myfunction%28%27Foo',
 3)

 where myfunction itself iterates thru all the instances of Foo*

 any help appreciated

 --
 View this message in context:
 http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25280228.html
 Sent from the Solr - User mailing list archive at Nabble.com.




RE: how to scan dynamic field without specifying each field in query

2009-09-03 Thread gdeconto

thx for the reply.

you mean into a multivalue field?  possible, but was wondering if there was
something more flexible than that.  the ability to use a function (ie
myfunction) would open up some possibilities for more complex searching and
search syntax.

I could write my own query parser with special extended syntax, but that is
farther than I wanted to go.



Manepalli, Kalyan wrote:
 
 You can copy the dynamic fields value into a different field and query on
 that field.
 
 Thanks,
 Kalyan Manepalli
 
 

-- 
View this message in context: 
http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25280669.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to use DataImportHandler with ExtractingRequestHandler?

2009-09-03 Thread Sascha Szott

Hi Khai,

a few weeks ago, I was facing the same problem.

In my case, this workaround helped (assuming, you're using Solr 1.3): 
For each row, extract the content from the corresponding pdf file using 
a parser library of your choice (I suggest Apache PDFBox or Apache Tika 
in case you need to process other file types as well), put it between


foo![CDATA[

and

]]/foo

and store it in a text file. To keep the relationship between a file and 
its corresponding database row, use the primary key as the file name.


Within data-config.xml use the XPathEntityProcessor as follows (replace 
dbRow and primaryKey respectively):


entity name=pdfcontent
processor=XPathEntityProcessor
forEach=/foo
url=${dbRow.primaryKey}.xml
  field column=pdftext xpath=/foo/
/entity


And, by the way, in Solr 1.4 you do not have to put your content between 
xml tags: use the PlainTextEntityProcessor instead of XPathEntityProcessor.


Best,
Sascha

Khai Doan schrieb:

Hi all,

My name is Khai.  I have a table in a relational database.  I have
successfully use DataImportHandler to import this data into Apache Solr.
However, one of the column store the location of PDF file.  How can I
configure DataImportHandler to use ExtractingRequestHandler to extract the
content of the PDF?

Thanks!

Khai Doan





Re: how to scan dynamic field without specifying each field in query

2009-09-03 Thread Avlesh Singh
A query parser, may be.
But that would not help either. End of the day, someone has to create those
many boolean queries in your case.

Cheers
Avlesh

On Thu, Sep 3, 2009 at 10:59 PM, gdeconto gerald.deco...@topproducer.comwrote:


 thx for the reply.

 you mean into a multivalue field?  possible, but was wondering if there was
 something more flexible than that.  the ability to use a function (ie
 myfunction) would open up some possibilities for more complex searching and
 search syntax.

 I could write my own query parser with special extended syntax, but that is
 farther than I wanted to go.



 Manepalli, Kalyan wrote:
 
  You can copy the dynamic fields value into a different field and query on
  that field.
 
  Thanks,
  Kalyan Manepalli
 
 

 --
 View this message in context:
 http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25280669.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: how to scan dynamic field without specifying each field in query

2009-09-03 Thread Renaud Delbru

Hi,

maybe SIREn [1] can help you for this task. SIREn is a Lucene plugin 
that allows to index and query tabular data. You can for example create 
a SIREn field foo, index n values in n cells, and then query a 
specific cell or a range of cells. Unfortunately, the Solr plugin is not 
yet available, and therefore you will have to write your own query 
syntax and parser for this task.


Regards,

[1] http://siren.sindice.com
--
Renaud Delbru

gdeconto wrote:

thx for the reply.

you mean into a multivalue field?  possible, but was wondering if there was
something more flexible than that.  the ability to use a function (ie
myfunction) would open up some possibilities for more complex searching and
search syntax.

I could write my own query parser with special extended syntax, but that is
farther than I wanted to go.



Manepalli, Kalyan wrote:
  

You can copy the dynamic fields value into a different field and query on
that field.

Thanks,
Kalyan Manepalli





  




Re: how to scan dynamic field without specifying each field in query

2009-09-03 Thread gdeconto

I am thinking that my example was too simple/generic :-U.  It is possible for
more several dynamic fields to exist and other functionality to be required.
i.e. what about if my example had read:

http://localhost:8994/solr/select?q=((Foo1:3 OR Foo2:3 OR Foo3:3 OR …
Foo999:3) AND (Bar1:1 OR Bar2:1 OR Bar3:1...Bar999:1) AND (Etc1:7 OR Etc2:7
OR Etc3:7...Etc:999:7)

obviously a nasty query (and care would be needed for MAX_BOOLEAN_CLAUSES). 
that said, are there other mechanisms to better handle that type of query,
i.e.:

http://localhost:8994/solr/select?q=(myfunction(‘Foo’, 3) AND
myfunction('Bar', 1) AND (myfunction('Etc', 7))


gdeconto wrote:
 
 say I have a dynamic field called Foo* (where * can be in the hundreds)
 and want to search Foo* for a value of 3 (for example)
 
 I know I can do this via this:
 
 http://localhost:8994/solr/select?q=(Foo1:3 OR Foo2:3 OR Foo3:3 OR …
 Foo999:3)
 
 However, is there a better way?  i.e. is there some way to query by a
 function I create, possibly something like this:
 
 http://localhost:8994/solr/select?q=myfunction(‘Foo’, 3)
 
 where myfunction itself iterates thru all the instances of Foo*
 
 any help appreciated
 
 

-- 
View this message in context: 
http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25283094.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-03 Thread Uri Boness
The collapsed documents are represented by one master document which 
can be part of the normal search result (the doc list), so pagination 
just works as expected, meaning taking only the returned documents in 
account (ignoring the collapsed ones). As for the scoring, the master 
document is actually the document with the highest score in the 
collapsed group.


As for Solr 1.3 compatibility... well... it's very hart to tell. All 
latest patch are certainly *not* 1.3 compatible (I think they're also 
depending on some changes in lucene which are not available for solr 
1.3). I guess you'll have to try some of the old patches, but I'm not 
sure about their stability.


cheers,
Uri

R. Tan wrote:

Thanks Uri. How does paging and scoring work when using field collapsing?
What patch works with 1.3? Is it production ready?

R


On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote:

  

The development on this patch is quite active. It works well for single
solr instance, but distributed search (ie. shards) is not yet supported.
Using this page you can group search results based on a specific field.
There are two flavors of field collapsing - adjacent and non-adjacent, the
former collapses only document which happen to be located next to each other
in the otherwise-non-collapsed results set. The later (the non-adjacent) one
collapses all documents with the same field value (regardless of their
position in the otherwise-non-collapsed results set). Note, that
non-adjacent performs better than adjacent one. There's currently discussion
to extend this support so in addition to collapsing the documents, extra
information will be returned for the collapsed documents (see the discussion
on the issue page).

Uri


R. Tan wrote:



I think this is what I'm looking for. What is the status of this patch?

On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com wrote:



  

Hi Solrers,
I would like to get your opinion on how to best approach a search
requirement that I have. The scenario is I have a set of business
listings
that may be group into one parent business (such as 7-eleven having
several
locations). On the results page, I only want 7-eleven to show up once but
also show how many locations matched the query (facet filtered by state,
for
example) and maybe a preview of the some of the locations.

Searching for the business name is straightforward but the locations
within
the a result is quite tricky. I can do the opposite, searching for the
locations and faceting on business names, but it will still basically be
the
same thing and repeat results with the same business name.

Any advice?

Thanks,
R






  


  


Re: Best way to do a lucene matchAllDocs not using q.alt=*:*

2009-09-03 Thread Uri Boness

you can use LukeRequestHandler http://localhost:8983/solr/admin/luke

Marc Sturlese wrote:

Hey there,
I need a query to get the total number of documents in my index. I can get
if I do this using DismaxRequestHandler:
q.alt=*:*facet=falsehl=falserows=0
I have noticed this query is very memory consuming. Is there any more
optimized way in trunk to get the total number of documents of my index?
Thanks in advanced

  


Using scoring from another program

2009-09-03 Thread Paul Tomblin
Every document I put into Solr has a field origScore which is a
floating point number between 0 and 1 that represents a score assigned
by the program that generated the document.  I would like it that when
I do a query, it uses that origScore in the scoring, perhaps
multiplying the Solr score to find a weighted score and using that to
determine which are the highest scoring matches.  Can I do that?

-- 
http://www.linkedin.com/in/paultomblin


Re: Using scoring from another program

2009-09-03 Thread Uri Boness

Function queries is what you need: http://wiki.apache.org/solr/FunctionQuery

Paul Tomblin wrote:

Every document I put into Solr has a field origScore which is a
floating point number between 0 and 1 that represents a score assigned
by the program that generated the document.  I would like it that when
I do a query, it uses that origScore in the scoring, perhaps
multiplying the Solr score to find a weighted score and using that to
determine which are the highest scoring matches.  Can I do that?

  


Sanity check: ResonseWriter directly to a database?

2009-09-03 Thread seanoc5

Hello all,
Are there any hidden gotchas--or even basic suggestions--regarding
implementing something like a DBResponseWriter that puts responses right
into a database? My specific questions are:

1) Any problems adding non-trivial jars to a solr plugin? I'm thinkin JDBC
and then perhaps Hibernate libraries? 
I don't believe so, but I have just enough understanding to be dangerous at
the moment.

2) Is JSONResponseWriter a reasonable copy/paste starting point for me?  Is
there anything that might match better, especially regarding initialization
and connection pooling?

3) Say I have a read-write single-core solr server: a vanilla-out-of-the-box
example install. Can I concurrently update the underlying index safely with
EmbeddedSolrServer? (This is my backup approach, less preferred)
I assume no, one of them has to be read only, but I've learned not to
under-estimate the lucene/solr developers.  

I'm starting with adapting JSONResponseWriter and the 
http://wiki.apache.org/solr/SolrPlugins wiki notes . The docs seem to
indicate all I need to do is package up the appropriate supporting (jdbc)
jar files into my MyDBResponse.jar, and drop it into the ./lib dir (e.g.
c:\solr-svn\example\solr\lib). Of course, I need to update my solrconfig.xml
to use the new DBResponseWriter.

Straight straight JDBC seems like the easiest starting point. If that works,
perhaps move the DB stuff to hibernate.  Does anyone have a best practice
suggestion for database access inside a plugin? I rather expect the answer
might be use JNDI and well-configured hibernate; no special problems
related to 'inside' a solr plugin. I will eventually be interested in
saving both query results and document indexing information, so I expect to
do this in both a (custom) ResponseWriter, and ... um... a
DocumentAnalysisRequestHandler?   

I realize embedded solr might be a better choice (performance has been a big
issue in my current implementation), and I am looking into that as well. If
feasible, I'd like to keep solr in charge of the database content through
plugins and extensions, rather than keeping both solr and db synced from my
(grails) app. 
Thanks,

Sean


-- 
View this message in context: 
http://www.nabble.com/Sanity-check%3A-ResonseWriter-directly-to-a-database--tp25284734p25284734.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Clarifications to Synonym Filter Wiki entry? (1 of 2)

2009-09-03 Thread Chris Hostetter
: I believe the following section is a bit misleading; I'm sure it's correct
: for the case it describes, but there's another case I've tested, which on
: the surface seemed similar, but where the actual results were different and
: in hindsight not really a conflict, just a surprise.

the crux of the issue is that *lines* in the file with only commas (no =) 
are ambiguious, and only have meaning once the expand property is evaluated.  
once that's done then you have a list of *mappings* ... and it's the 
mappings that get merged.

: I tested this by actually looking at the word index with Luke.

FYI: an easy way to test it would probably be the analysis.jsp page

: If you DID want the merged behavior, where D would expand to match all 9
: letters you can either:
: 1: Put the synonym filter in the pipeline twice, along with the remove
: duplicates filter
: OR
: 2: Use the synonym filter at both index and query time

using the filter at query time with expand=true would wreck havoc with 
phrase queries ... your best bet is to be more explicit when expressing 
the mappings in the file.

: And what should be added to the Wiki doc?

Add whatever you think would help ... users discovering behavior for hte 
first time are the best people to write documentation, because the devs 
who know the code really well don't apprecaite what isn't obvious.



-Hoss



Single Core or Multiple Core?

2009-09-03 Thread Jonathan Ariel
It seems like it is really hard to decide when the Multiple Core solution is
more appropriate.As I could understand from this list and wiki the Multiple
Core feature was designed to address the need of handling different sets of
data within the same solr instance, where the sets of data don't need to be
joined.
In my case the documents are of a specific site and country. So document A
can be of Site 1 / Country 1, B of Site 2 / Country 1, C of Site 1 / Country
2, and so on.
For the use cases of my application I will never query across countries or
sites. I will always have to provide to the query the country id and the
site id.
Would you suggest to split my data into cores? I have few sites (around 20)
and more countries (around 90).
Should I split my data into sites (around 20 cores) and within a core filter
by site? Should I split by Site and Country (around 1800 cores)?
What should I consider when splitting my data into multiple cores?

Thanks

Jonathan


Re: Searching with or without diacritics

2009-09-03 Thread Chris Hostetter

Take a look at the MappingCharFilterFactory (in Solr 1.4) and/or the 
ISOLatin1AccentFilterFactory.

: Date: Thu, 27 Aug 2009 16:30:08 +0200
: From: [ISO-8859-1] Gy�rgy Frivolt gyorgy.friv...@gmail.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user solr-user@lucene.apache.org
: Subject: Searching with or without diacritics
: 
: Hello,
: 
:  I started to use solr only recently using the ruby/rails sunspot-solr
: client. I use solr on a slovak/czech data set and realized one not wanted
: behaviour of the search. When the user searches an expression or word which
: contains dicritics, letters like š, č, ť, ä, ô,... usually the special
: characters are omitted in the search query. In this case solr does not
: return records which contain the expression intended to be found by the
: user.
:  How can I configure solr in a way, that it founds records containing
: special characters, even if they are without special accents in the query?
: 
:  Some info about my solr instance: Solr Specification Version: 1.3.0Solr
: Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12
: 11:06:47Lucene Specification Version: 2.4-devLucene Implementation Version:
: 2.4-dev 691741 - 2008-09-03 15:25:16
: 
: Thank for your help, regards,
: 
:  Georg
: 



-Hoss


Re: SnowballPorterFilterFactory stemming word question

2009-09-03 Thread Chris Hostetter

: If i give machine why is that it stems to machin, now from where does
: this word come from
: If i give revolutionary it stems to revolutionari, i thought it should
: stem to revolution.
: 
: How does stemming work?

the porter stemmer (and all of the stemmers provided with solr) are 
programtic stemmers ... they don't actually know the root of any words the 
use an aproximate algorithm to compute a *token* from a word based on a 
set of rules ... these tokens aren't neccessarily real words (and most of 
the time they aren't words) but the same token tends to be produced from 
words with similar roots.

if you want to see the actaul root word, you'll have to use a dictionary 
based stemmer.


-Hoss



Re: Impact of compressed=true attribute (in schema.xml) on Indexing/Query

2009-09-03 Thread Chris Hostetter

: Now the question is, how the compressed=true flag impacts the indexing 
: and Querying operations. I am sure that there will be CPU utilization 
: spikes as there will be operation of compressing(during indexing) and 
: uncompressing(during querying) of the indexed data. I am mainly looking 
: for any bench marks for the above scenario.

i don't have any hard numbers for you, but the stored data isn't 
uncompressed when executing aquery -- queries are executed against the 
indexed terms (which are never compressed) ... the only time the data will 
be uncompressed is when returning results to the client -- so if you set 
rows=17 in your request, only the values for the 17 docs returned  (or 
less if there were fewer then 17 matches) will be uncompressed.



-Hoss



Re: Optimal Cache Settings, complicated by regular commits

2009-09-03 Thread Chris Hostetter

: I'm trying to work out the optimum cache settings for our Solr server, I'll
: begin by outlining our usage.

...but you didn't give any information about what your cache settings look 
like ... size is only part of the picture, the autowarm counts are more 
significant.

: Commit frequency: sometimes we do massive amounts of sequential commits,

if you know you are going to be indexing more docs soon, then you can hold 
off on issuing a commit ... it really comes down to what kind of SLA you 
have to provide on how quickly an add/update is visible in the index -- 
don't commit any more often then that.

: The problem we have is that the default cache settings resulting in very low
: hit rates (less than 30% for documents, less than 1% for filterCache), so we

under 1% for filterCache sounds like you either have some really unique 
filter queries, or you are using enum based faceting on a huge field and 
the LRU cache is working against you by expunging values during a single 
request ... what version of solr are you using? what do the fieldtype 
declarations look like for the fields you are faceting on? what do the 
luke stats look like for hte fields you are faceting on?

: now we have the issue of commits being very slow (more than 5 seconds for a
: document), to the point where it causes a timeout elsewhere in our systems.
: This is made worse by the fact that committing seems to empty the cache,
: given that it takes about an hour to get the cache to a good state this is
: obviously very problematic.

1) using waitSearch=false can help speed up the commit if all you care 
about is not having your client time out.

2) using autowarming can help fill the caches up prior to users making 
requests (you may already know that, but since you didn't provide your 
cache configs i have no idea) .. they key is finding a good autowarm count 
that helps your cache stats w/o taking too long to fill up.


-Hoss



Re: Sorting performance + replication of index between cores

2009-09-03 Thread Sreeram Vaidyanathan

Did u guys find a solution?
I am having a similar issue.

Setup:
One indexer box  2 searcher box. Each having 6 different solr-cores
We have a lot of updates (in the range of a couple thousand items every few
mins).
The Snappuller/Snapinstaller pulls and commits every 5 mins.

Query response time peaks to 60+ seconds when a new searcher is being
prepared.
I have disabled the caches (filter, query  document). 

We have a strict requirement of response time  10 secs all the time.

Thanks
Sreeram


sunnyfr wrote:
 
 Hi Christophe, 
 
 Did you find a way to fix up your problem, cuz even with replication will
 have this problem, lot of update means clear cache and manage that.
 I've the same issue, I just wondering if I won't turn off servers during
 update ??? 
 How did you fix that ? 
 
 Thanks,
 sunny
 
 
 christophe-2 wrote:
 
 Hi,
 
 After fully reloading my index, using another field than a Data does not 
 help that much.
 Using a warmup query avoids having the first request slow, but:
  - Frequents commits means that the Searcher is reloaded frequently 
 and, as the warmup takes time, the clients must wait.
  - Having warmup slows down the index process (I guess this is 
 because after a commit, the Searchers are recreated)
 
 So I'm considering, as suggested,  to have two instances: one for 
 indexing and one for searching.
 I was wondering if there are simple ways to replicate the index in a 
 single Solr server running two cores ? Any such config already tested ? 
 I guess that the standard replication based on rsync can be simplified a 
 lot in this case as the two indexes are on the same server.
 
 Thanks
 Christophe
 
 Beniamin Janicki wrote:
 :so you can send your updates anytime you want, and as long as you only 
 :commit every 5 minutes (or commit on a master as often as you want, but 
 :only run snappuller/snapinstaller on your slaves every 5 minutes) your 
 :results will be at most 5minutes + warming time stale.

 This is what I do as well ( commits are done once per 5 minutes ). I've
 got
 master - slave configuration. Master has turned off all caches
 (commented in
 solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
 ,Xmx= 1GB and committing takes around 10 secs ( on default configuration
 with warming it took from 30 mins up to 2 hours). 

 Slave caches are configured to have autowarmCount=0 and
 maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
 done. I haven't noticed any huge delays while serving search request.
 Try to use those values - may be they'll help in your case too.

 Ben Janicki


 -Original Message-
 From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
 Sent: 22 October 2008 04:56
 To: solr-user@lucene.apache.org
 Subject: Re: Sorting performance


 : The problem is that I will have hundreds of users doing queries, and a
 : continuous flow of document coming in.
 : So a delay in warming up a cache could be acceptable if I do it a
 few
 times
 : per day. But not on a too regular basis (right now, the first query
 that
 loads
 : the cache takes 150s).
 : 
 : However: I'm not sure why it looks not to be a good idea to update the
 caches

 you can refresh the caches automaticly after updating, the newSearcher 
 event is fired whenever a searcher is opened (but before it's used by 
 clients) so you can configure warming queries for it -- it doesn't have
 to 
 be done manually (or by the first user to use that reader)

 so you can send your updates anytime you want, and as long as you only 
 commit every 5 minutes (or commit on a master as often as you want, but 
 only run snappuller/snapinstaller on your slaves every 5 minutes) your 
 results will be at most 5minutes + warming time stale.


 -Hoss

   
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Sorting-performance-tp20037712p25286018.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Re : Using SolrJ with Tika

2009-09-03 Thread Grant Ingersoll

See https://issues.apache.org/jira/browse/SOLR-1411

On Sep 3, 2009, at 6:47 AM, Angel Ice wrote:


Hi

This is the solution I was testing.
I got some difficulties with AutoDetectParser but I think it's the  
solution I will use in the end.



Thanks for the advice anyway :)

Regards,

Laurent





De : Abdullah Shaikh abdullah.sha...@viithiisys.com
À : solr-user@lucene.apache.org
Envoyé le : Jeudi, 3 Septembre 2009, 14h31mn 10s
Objet : Re: Using SolrJ with Tika

Hi Laurent,

I am not sure if this is what you need, but you can extract the  
content from
the uploaded document (MS Docs, PDF etc) using TIKA and then send it  
to SOLR

for indexing.

String CONTENT = extract the content using TIKA (you can use
AutoDetectParser)

and then,

SolrInputDocument doc = new SolrInputDocument();
doc.addField(DOC_CONTENT, CONTENT);

solrServer.add(doc);
soltServer.commit();


On Wed, Sep 2, 2009 at 5:26 PM, Angel Ice lbil...@yahoo.fr wrote:


Hi everybody.

I hope it's the right place for questions, if not sorry.

I'm trying to index rich documents (PDF, MS docs etc) in SolR/Lucene.
I have seen a few examples explaining how to use tika to solve  
this. But
most of these examples are using curl to send documents to Solr or  
an HTML

POST with an input file.
But i'd like to do it in full java.
Is there a way to use Solrj to index the documents with the
ExtractingRequestHandler of SolR or at least to get the extracted  
xml back

(with the extract.only option) ?

Many thanks.

Laurent.










--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Solr, JNDI config, dataDir, and solr home problem

2009-09-03 Thread Archon810

Here's my problem.

I'm trying to follow a multi Solr setup, straight from the Solr wiki -
http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac.

Here's the relevant code:
lt;Context docBase=/some/path/solr.war debug=0 crossContext=true gt;
   lt;Environment name=solr/home type=java.lang.String
value=/some/path/solr1home override=true /gt;
lt;/Contextgt;

Now I want to set the Solr lt;dataDirgt; in solrconfig.xml, relative to
the solr home property. The instructions
http://wiki.apache.org/solr/SolrConfigXml#head-e8fbf2d748d90c5900aac712d0e3385ced5bd128
say lt;dataDirgt; is used to specify an alternate directory to hold all
index data other than the default ./data under the Solr home. If replication
is in use, this should match the replication configuration. If this
directory is not absolute, then it is relative to the current working
directory of the servlet container.

However, no matter how I try to set the dataDir property, solr home is not
being found. For example,
  lt;dataDirgt;${solr.home}/datalt;/dataDirgt;

What's even more confusing are these INFO notices in the log:
INFO: No /solr/home in JNDI
Sep 3, 2009 4:33:26 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)

The JNDI instructions instruct to specify solr/home, the log complains
about /solr/home (extra slash), the solrconfig.xml file seems to expect
${solr.home} - how more confusing can it get? 

This person is having the same issue:
http://mysolr.com/tips/setting-solr-home-solrhome-in-jndi-on-tomcat-55/

So, how does one refer to solr home from solrconfig.xml in a JNDI
configuration scenario? Also, is there a way to debug/see variables that are
defined in a specific context, such as solrconfig.xml? I feel like I'm
completely blind here.

Thank you!
-- 
View this message in context: 
http://www.nabble.com/Solr%2C-JNDI-config%2C-dataDir%2C-and-solr-home-problem-tp25286277p25286277.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Logging solr requests

2009-09-03 Thread Chris Hostetter

: - I think that the use  of log files is discouraged, but i don't know if i
: can modify solr settings to log to a server (via rmi or http)
: - Don't want to drop down solr response performance

discouraged by who? ... having aseperate process tail your log file and 
build an index that way is the simplest way to do this without impact 
Solr's performace ... alternately you could write a custom LogHandler that 
sends the data anywhere you want (so you never need a log file) but that 
would require some non-trivial asynch code in your LogHandler to keep the 
budiling of your new idex from affecting hte performace (log calls are 
synchronous)


-Hoss



Re: Problem querying for a value with a space

2009-09-03 Thread Chris Hostetter

: Use +specific_LIST_s:(For Sale)
: or
: +specific_LIST_s:For Sale

those are *VERY* different queries.

The first is just syntac sugar for...
  +specific_LIST_s:For +specific_LIST_s:Sale

...which is not the same as the second query (especially when using 
StrField, or KeyworddTokenizer)



-Hoss



Re: Sanity check: ResonseWriter directly to a database?

2009-09-03 Thread Avlesh Singh

 Are there any hidden gotchas--or even basic suggestions--regarding
 implementing something like a DBResponseWriter that puts responses right
 into a database?

Absolutely not! A QueryResponseWriter with an empty write method fulfills
all interface obligations. My only question is, why do you want a
ResponeWriter to do this for you? Why not write something outside Solr,
which gets the response, and then puts it in database. If it has to be a
Solr utility, then maybe a RequestHandler.
The only reason I am asking this, is because your QueryResponseWriter will
have to implement a method called getContentType. Sounds illigical in your
case.

Any problems adding non-trivial jars to a solr plugin?

None. I have tonnes of them.

Is JSONResponseWriter a reasonable copy/paste starting point for me?  Is
 there anything that might match better, especially regarding initialization
 and connection pooling?

As I have tried to expalain above, a QueryResponseWriter with an empty
write method is just perfect. You can use anyone of the well know writers
as a starting point.

Say I have a read-write single-core solr server: a vanilla-out-of-the-box
 example install. Can I concurrently update the underlying index safely with
 EmbeddedSolrServer?

Yes you can! Other searchers will only come to know of changes when they are
re-opened.

Cheers
Avlesh

On Fri, Sep 4, 2009 at 3:26 AM, seanoc5 sean...@gmail.com wrote:


 Hello all,
 Are there any hidden gotchas--or even basic suggestions--regarding
 implementing something like a DBResponseWriter that puts responses right
 into a database? My specific questions are:

 1) Any problems adding non-trivial jars to a solr plugin? I'm thinkin JDBC
 and then perhaps Hibernate libraries?
 I don't believe so, but I have just enough understanding to be dangerous at
 the moment.

 2) Is JSONResponseWriter a reasonable copy/paste starting point for me?  Is
 there anything that might match better, especially regarding initialization
 and connection pooling?

 3) Say I have a read-write single-core solr server: a
 vanilla-out-of-the-box
 example install. Can I concurrently update the underlying index safely with
 EmbeddedSolrServer? (This is my backup approach, less preferred)
 I assume no, one of them has to be read only, but I've learned not to
 under-estimate the lucene/solr developers.

 I'm starting with adapting JSONResponseWriter and the
 http://wiki.apache.org/solr/SolrPlugins wiki notes . The docs seem to
 indicate all I need to do is package up the appropriate supporting (jdbc)
 jar files into my MyDBResponse.jar, and drop it into the ./lib dir (e.g.
 c:\solr-svn\example\solr\lib). Of course, I need to update my
 solrconfig.xml
 to use the new DBResponseWriter.

 Straight straight JDBC seems like the easiest starting point. If that
 works,
 perhaps move the DB stuff to hibernate.  Does anyone have a best practice
 suggestion for database access inside a plugin? I rather expect the answer
 might be use JNDI and well-configured hibernate; no special problems
 related to 'inside' a solr plugin. I will eventually be interested in
 saving both query results and document indexing information, so I expect to
 do this in both a (custom) ResponseWriter, and ... um... a
 DocumentAnalysisRequestHandler?

 I realize embedded solr might be a better choice (performance has been a
 big
 issue in my current implementation), and I am looking into that as well. If
 feasible, I'd like to keep solr in charge of the database content through
 plugins and extensions, rather than keeping both solr and db synced from my
 (grails) app.
 Thanks,

 Sean


 --
 View this message in context:
 http://www.nabble.com/Sanity-check%3A-ResonseWriter-directly-to-a-database--tp25284734p25284734.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Exact Word Search

2009-09-03 Thread bhaskar chandrasekar
Hi shalin,
 
Thanks for your reply.
I am not sure as how the query is formed in Solr.
If you could throw some light on this , it will be helpful.
Is it achievable?.
 
Regards
Bhaskar


--- On Thu, 9/3/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote:


From: Shalin Shekhar Mangar shalinman...@gmail.com
Subject: Re: Exact Word Search
To: solr-user@lucene.apache.org
Date: Thursday, September 3, 2009, 5:14 AM


On Thu, Sep 3, 2009 at 1:33 PM, bhaskar chandrasekar
bas_s...@yahoo.co.inwrote:

 Hi,

 Can any one help me with the below scenario?.

 Scenario :

 I have integrated Solr with Carrot2.
 The issue is
 Assuming i give bhaskar as input string for search.
 It should give me search results pertaining to bhaskar only.
  Example: It should not display search results as chandarbhaskar or
  bhaskarc.
  Basically search should happen based on the exact word match. I am not
 bothered about case sensitive here
  How to achieve the above Scenario in Carrot2 ?.


Bhaskar, I think this question is better suited for the Carrot mailing
lists. Unless you yourself control how the solr query is created, we will
not be able to help you.

-- 
Regards,
Shalin Shekhar Mangar.



  

Re: Sanity check: ResonseWriter directly to a database?

2009-09-03 Thread seanoc5

Avlesh,
Great response, just what I was looking for. 

As far as QueryResponseWriter vs RequestHandler: you're absolutely right,
request handling is the way to go. It looks like I can start with something
like : 
public class SearchSavesToDBHandler extends RequestHandlerBase implements
SolrCoreAware

I am still weighing keeping this logic in my app, However, with solr-cell
coming along nicely, and my the nature of my queries (95% pre-defined for
content analysis), I am leaning toward the extra work of embedding the
processing in solr. I'm still unclear where the best path is, but I think
that's fairly specific to my app.

Great news about the flexibility of having both approaches be able to work
on the same index. That may well save me if I run out of time on the plugin
development. 
Thank for your relply, it was a great help,

Sean



Avlesh Singh wrote:
 

 Are there any hidden gotchas--or even basic suggestions--regarding
 implementing something like a DBResponseWriter that puts responses right
 into a database?

 Absolutely not! A QueryResponseWriter with an empty write method
 fulfills
 all interface obligations. My only question is, why do you want a
 ResponeWriter to do this for you? Why not write something outside Solr,
 which gets the response, and then puts it in database. If it has to be a
 Solr utility, then maybe a RequestHandler.
 The only reason I am asking this, is because your QueryResponseWriter will
 have to implement a method called getContentType. Sounds illigical in
 your
 case.
 
 Any problems adding non-trivial jars to a solr plugin?

 None. I have tonnes of them.
 
 Is JSONResponseWriter a reasonable copy/paste starting point for me?  Is
 there anything that might match better, especially regarding
 initialization
 and connection pooling?

 As I have tried to expalain above, a QueryResponseWriter with an empty
 write method is just perfect. You can use anyone of the well know
 writers
 as a starting point.
 
 Say I have a read-write single-core solr server: a vanilla-out-of-the-box
 example install. Can I concurrently update the underlying index safely
 with
 EmbeddedSolrServer?
 
 Yes you can! Other searchers will only come to know of changes when they
 are
 re-opened.
 
 Cheers
 Avlesh
 
 On Fri, Sep 4, 2009 at 3:26 AM, seanoc5 sean...@gmail.com wrote:
 

 Hello all,
 Are there any hidden gotchas--or even basic suggestions--regarding
 implementing something like a DBResponseWriter that puts responses right
 into a database? My specific questions are:

 1) Any problems adding non-trivial jars to a solr plugin? I'm thinkin
 JDBC
 and then perhaps Hibernate libraries?
 I don't believe so, but I have just enough understanding to be dangerous
 at
 the moment.

 2) Is JSONResponseWriter a reasonable copy/paste starting point for me? 
 Is
 there anything that might match better, especially regarding
 initialization
 and connection pooling?

 3) Say I have a read-write single-core solr server: a
 vanilla-out-of-the-box
 example install. Can I concurrently update the underlying index safely
 with
 EmbeddedSolrServer? (This is my backup approach, less preferred)
 I assume no, one of them has to be read only, but I've learned not to
 under-estimate the lucene/solr developers.

 I'm starting with adapting JSONResponseWriter and the
 http://wiki.apache.org/solr/SolrPlugins wiki notes . The docs seem to
 indicate all I need to do is package up the appropriate supporting (jdbc)
 jar files into my MyDBResponse.jar, and drop it into the ./lib dir (e.g.
 c:\solr-svn\example\solr\lib). Of course, I need to update my
 solrconfig.xml
 to use the new DBResponseWriter.

 Straight straight JDBC seems like the easiest starting point. If that
 works,
 perhaps move the DB stuff to hibernate.  Does anyone have a best
 practice
 suggestion for database access inside a plugin? I rather expect the
 answer
 might be use JNDI and well-configured hibernate; no special problems
 related to 'inside' a solr plugin. I will eventually be interested in
 saving both query results and document indexing information, so I expect
 to
 do this in both a (custom) ResponseWriter, and ... um... a
 DocumentAnalysisRequestHandler?

 I realize embedded solr might be a better choice (performance has been a
 big
 issue in my current implementation), and I am looking into that as well.
 If
 feasible, I'd like to keep solr in charge of the database content
 through
 plugins and extensions, rather than keeping both solr and db synced from
 my
 (grails) app.
 Thanks,

 Sean


 --
 View this message in context:
 http://www.nabble.com/Sanity-check%3A-ResonseWriter-directly-to-a-database--tp25284734p25284734.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Sanity-check%3A-ResonseWriter-directly-to-a-database--tp25284734p25288206.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Responses getting truncated

2009-09-03 Thread Rupert Fiasco
So we have been running LucidWorks for Solr for about a week now and
have seen no problems - so I believe it was due to that buffering
issue in Jetty 6.1.3, estimated here:

 It really looks like you're hitting a lower-level IO buffering bug
 (esp when you see a response starting off with the tail of another
 response).  That doesn't look like it could be a Solr bug... but
 rather smells like a thread safety bug in the servlet container.

Thanks for everyones help and input. LucidWorks For The Win.

-Rupert

On Fri, Aug 28, 2009 at 4:07 PM, Rupert Fiascorufia...@gmail.com wrote:
 I deployed LucidWorks with my existing solrconfig / schema and
 re-indexed my data into it and pushed it out to production, we'll see
 how it stacks up over the weekend. Already queries that were breaking
 on the prior Jetty/stock Solr setup are now working - but I have seen
 it before where upon an initial re-index things work OK then a couple
 of days later they break.

 Keep y'all posted.

 Thanks
 -Rupert

 On Fri, Aug 28, 2009 at 3:12 PM, Rupert Fiascorufia...@gmail.com wrote:
 Yes, I am hitting the Solr server directly (medsolr1.colo:9007)

 Versions / architectures:

 Jetty(6.1.3)

 o...@medsolr1 ~ $ uname -a
 Linux medsolr1 2.6.18-xen-r12 #9 SMP Tue Mar 3 15:34:08 PST 2009
 x86_64 Intel(R) Xeon(R) CPU L5420 @ 2.50GHz GenuineIntel GNU/Linux

 o...@medsolr1 ~ $ java -version
 java version 1.6.0_11
 Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)


 I was thinking of trying LucidWorks for Solr (1.3.02) x64 - worth a try.

 -Rupert

 On Fri, Aug 28, 2009 at 3:08 PM, Yonik Seeleyysee...@gmail.com wrote:
 On Mon, Aug 24, 2009 at 6:30 PM, Rupert Fiascorufia...@gmail.com wrote:
 If I run these through curl on the command its
 truncated and if I run the search through the web-based admin panel
 then I get an XML parse error.

 Are you running curl directly against the solr server, or going
 through a load balancer?  Cutting out the middle-men using curl was a
 great idea - just make sure to go all the way.

 At first I thought it could possibly be a FastWriter bug (internal
 Solr class), but that's only used on the TextWriter (JSON, Python,
 Ruby) based formats, not on the original XML format.

 It really looks like you're hitting a lower-level IO buffering bug
 (esp when you see a response starting off with the tail of another
 response).  That doesn't look like it could be a Solr bug... but
 rather smells like a thread safety bug in the servlet container.

 What type of machine are you running on?  What JVM?
 You could try upgrading your version of Jetty, the JVM, or try
 switching to Tomcat.

 -Yonik
 http://www.lucidimagination.com


 This appears to have just started recently and the only thing we have
 done is change our indexer from a PHP one to a Java one, but
 functionally they are identical.

 Any thoughts? Thanks in advance.

 - Rupert