I cant get it to work

2009-12-15 Thread Faire Mii

I just cant get it.

If i got 10 tables in mysql and they are all related to eachother with  
foreign keys. Should i have 10 documents in solr?


or just one document with rows from all tables in it?

i have tried in vain for 2 days now...plz help

regards

fayer


Log of zero result searches

2009-12-15 Thread Roland Villemoes
Hi 

Question: How do you log zero result searches?

I quite important from a business perspective to know what searches that 
returns zero/empty results. 
Does anybody know a way to get this information? 

Roland Villemoes


Re: I cant get it to work

2009-12-15 Thread David Stuart

Hi,

The answer is it depends ;)

If your 10 tables represent an entity e.g a person their address etc  
the one document entity works


But if your 10 tables each represnt a series of entites that you want  
to surface in your search results separately then make a document for  
each (I.e it depends on your data).


What is your use case? Are you wanting a search index that is able to  
search on every field in your 10 tables or just a few?
Think of it this way if you where creating SQL to pull the data out of  
the db using joins etc what fields would you grab, do you get multiple  
rows back because some of you tables have a one to many relationship.  
Once you have formed that query that is your document minus the  
duplicate information caused by the rows


Cheers

David

On 15 Dec 2009, at 08:05, Faire Mii faire@gmail.com wrote:


I just cant get it.

If i got 10 tables in mysql and they are all related to eachother  
with foreign keys. Should i have 10 documents in solr?


or just one document with rows from all tables in it?

i have tried in vain for 2 days now...plz help

regards

fayer


Re: Log of zero result searches

2009-12-15 Thread David Stuart
The returning XML result tag has a numFound attribute that will report  
0 if nothing matches your search criteria


David

On 15 Dec 2009, at 08:16, Roland Villemoes r...@alpha-solutions.dk  
wrote:



Hi

Question: How do you log zero result searches?

I quite important from a business perspective to know what searches  
that returns zero/empty results.

Does anybody know a way to get this information?

Roland Villemoes


SV: Log of zero result searches

2009-12-15 Thread Roland Villemoes
Yes, correct. 

But to use that - the search client must collect this information whenever we 
have 0 results. 
I do not want that to be part of the client application (quite hard when that 
is SolrJS) - this should be collected server site - on Solr. 
Do you know how to do that?

Roland

-Oprindelig meddelelse-
Fra: David Stuart [mailto:david.stu...@progressivealliance.co.uk] 
Sendt: 15. december 2009 09:33
Til: solr-user@lucene.apache.org
Emne: Re: Log of zero result searches

The returning XML result tag has a numFound attribute that will report  
0 if nothing matches your search criteria

David

On 15 Dec 2009, at 08:16, Roland Villemoes r...@alpha-solutions.dk  
wrote:

 Hi

 Question: How do you log zero result searches?

 I quite important from a business perspective to know what searches  
 that returns zero/empty results.
 Does anybody know a way to get this information?

 Roland Villemoes


Re: Document model suggestion

2009-12-15 Thread Shalin Shekhar Mangar
On Tue, Dec 15, 2009 at 7:26 AM, caman aboxfortheotherst...@gmail.comwrote:


 Appreciate any guidance here please. Have a master-child table between two
 tables 'TA' and 'TB' where form is the master table. Any row in TA can have
 multiple row in TB.
 e.g. row in TA

 id---name
 1---tweets

 TB:
 id|ta_id|field0|field1|field2.|field20|created_by
 1|1|value1|value2|value2.|value20|User1

 snip/


 This works fine and index the data.But all the data for a row in TA gets
 combined in one document(not desirable).
 I am not clear on how to

 1) separate a particular row from the search results.
 e.g. If I search for 'Android' and there are 5 rows for android in TB for a
 particular instance in TA, would like to show them separately to user and
 if
 the user click on any of the row,point them to an attached URL in the
 application. Should a separate index be maintained for each row in TB?TB
 can
 have millions of rows.


The easy answer is that whatever you want to show as results should be the
thing that you index as documents. So if you want to show tweets as results,
one document should represent one tweet.

Solr is different from relational databases and you should not think about
both the same way. De-normalization is the way to go in Solr.


 2) How to protect one user's data from another user. I guess I can keep a
 column for a user_id in the schema and append that filter automatically
 when
 I search through SOLR. Any better alternatives?


That is usually what people do. The hard part is when some documents are
shared across multiple users.


 Bear with me if these are newbie questions please, this is my first day
 with
 SOLR.


No problem. Welcome to Solr!

-- 
Regards,
Shalin Shekhar Mangar.


Re: Not able to display search results on Tomcat/Solrj

2009-12-15 Thread Shalin Shekhar Mangar
On Tue, Dec 15, 2009 at 1:07 AM, insaneyogi3008 insaney...@gmail.comwrote:


 Hello,

 I am running a simple prg
 http://old.nabble.com/file/p26779970/SolrjTest.java SolrjTest.java  to get
 search results from a remote Solr server , I seem to correctly get back the
 number of documents that match my query , but I am not able to display the
 search results themselves .

 My question is , is this a known issue? I have attached the test  below is
 the sample of the result :


What is displayname and displayphone? Are they even in your schema?
Print out the SolrDocument object directly and you should see the results.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Query on Cache size.

2009-12-15 Thread Shalin Shekhar Mangar
On Mon, Dec 14, 2009 at 7:17 PM, kalidoss 
kalidoss.muthuramalin...@sifycorp.com wrote:

 Hi,

   We have enabled the query result cache, its 512 entries,

   we have calculated the size used for cache :
   page size about 1000bytes, (1000*512)/1024/1024  = .48MB


The query result cache is a map of (q, sort, n) to ordered list of Lucene
docids. Assuming queryResultWindowSize iw 20 and an average user does not go
beyond 20 results, your memory usage of the values in this map is
approx 20*sizeof(int)*512. Add some more for keys, map, references etc.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Auto update with deltaimport

2009-12-15 Thread Olala

Hi, thanks! I've done it by wrote a scripts to call
http://localhost:8080/solr/dataimport?command=delta-import automatically:-)


Joel Nylund wrote:
 
 windows or unix?
 
 unix - make a shell script and call it from cron
 
 windows - make a .bat or .cmd file and call it from scheduler
 
 within the shell scripts/bat files use wget or curl to call the right  
 import:
 
 wget -q -O /dev/null
 http://localhost:8983/solr/dataimport?command=delta-import
 
 
 Joel
 
 On Dec 12, 2009, at 1:38 AM, Olala wrote:
 

 Hi All!

 I am developing a search engine using Solr, I was tested full-import  
 and
 delta-import command successfully.But now,I want to run delta-import
 automatically with my schedule.So, can anyone help me???

 Thanks  Regards,
 -- 
 View this message in context:
 http://old.nabble.com/Auto-update-with-deltaimport-tp26755386p26755386.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 

-- 
View this message in context: 
http://old.nabble.com/Auto-update-with-deltaimport-tp26755386p26792041.html
Sent from the Solr - User mailing list archive at Nabble.com.



maximum no of values in multi valued string field

2009-12-15 Thread bharath venkatesh

Hi ,
  Is there any limit in no of values stored in a single  multi 
valued string field ? if a single multi valued string field contains 
1000-2000 string values what will be effect on query performance (we 
will be only indexing this field not storing it )  ? is it better to 
store all the strings  in a single  text field instead of multi valued 
string field.


Thanks in Advance,
Bharath
This message is intended only for the use of the addressee and may contain information that is privileged, confidential 
and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the 
employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly prohibited. If you have received this e-mail 
in error, please notify us immediately by return e-mail and delete this e-mail and all attachments from your system.




Re: maximum no of values in multi valued string field

2009-12-15 Thread Shalin Shekhar Mangar
On Tue, Dec 15, 2009 at 3:13 PM, bharath venkatesh 
bharath.venkat...@ibibogroup.com wrote:

 Hi ,
  Is there any limit in no of values stored in a single  multi valued
 string field ?


There is no limit theoretical limit. There are practical limits because your
documents are heavier. The document cache stores lucene documents in memory.


 if a single multi valued string field contains 1000-2000 string values what
 will be effect on query performance (we will be only indexing this field not
 storing it )  ?


Yes, the more the number of tokens, the longer it may take to search across
them. Faceting performance can drop drastically for such large number of
values.


 is it better to store all the strings  in a single  text field instead of
 multi valued string field.


It wouldn't make a lot of difference. The XML response may be a bit shorter.
In a single field highlighting can cause adjacent terms to be highlighted
which you may not want.

-- 
Regards,
Shalin Shekhar Mangar.


Re: I cant get it to work

2009-12-15 Thread regany


I've only just started with Solr too.

As a newbie, first I'd say forget about trying to compare it to your mysql
database.

It's completely different and performs it's own job in it's own way. You
feed a document in, and you store that information in the most efficient
manner you can to perform the search and return the results you want.

So ask, what do I want to search against?

field1
field2
field3

That's what you feed into Solr.

Then ask, what information do I want to return after a search? This
determines how you store the information you've just fed into Solr. Say
you want to return:

field2

Then you might accept field1, field2, and field3 and merge them together
into 1 searchable field called searchtext. This is what users will search
against. Then you'd also have field2 as another field.

field2 (not indexed, stored)
searchtext (combination of field1,field2,field2 - indexed, not stored)

So then you could search against searchtext and return field2 as the
result.

Hope that provides some explanation (I know it's basic). From my very
limited experience with, Solr is great. My biggest hurdle was getting my
head around the fact that it's NOT a relational database (ie. mysql) but a
separate tool that you configure  in the best way for your search and only
that.
-- 
View this message in context: 
http://old.nabble.com/I-cant-get-it-to-work-tp26791099p26792373.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Log of zero result searches

2009-12-15 Thread Shalin Shekhar Mangar
On Tue, Dec 15, 2009 at 2:36 PM, Roland Villemoes 
r...@alpha-solutions.dkwrote:

 Yes, correct.

 But to use that - the search client must collect this information whenever
 we have 0 results.
 I do not want that to be part of the client application (quite hard when
 that is SolrJS) - this should be collected server site - on Solr.
 Do you know how to do that?


The number of hits are logged along with each query in INFO level. You can
analyze the logs to figure out this stat.

-- 
Regards,
Shalin Shekhar Mangar.


Re: question regarding dynamic fields

2009-12-15 Thread Shalin Shekhar Mangar
On Mon, Dec 14, 2009 at 1:00 PM, Phanindra Reva reva.phanin...@gmail.comwrote:

 Hello..,
 I have observed that the text or keywords which are being
 indexed using dynamicField concept are being searchable only when we
 mention field name too while querying.Am I wrong with my observation
 or  is it the default and can not be changed? I am just wondering if
 there is any route to search the text indexed using dynamicFields with
 out having to mention the field name in the query.
 Thanks.


If you are asking if you can give *_s to search on all dynamic fields ending
with _s then the answer is no. You must specify the field name.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Payloads with Phrase queries

2009-12-15 Thread Raghuveer Kancherla
The interesting thing I am noticing is that the scoring works fine for a
phrase query like solr rocks.
This lead me to look at what query I am using in case of a single term.
Turns out that I am using PayloadTermQuery taking a cue from solr-1485
patch.

I changed this to BoostingTermQuery (i read somewhere that this is
deprecated .. but i was just experimenting) and the scoring seems to work as
expected now for a single term.

Now, the important question is what is the Payload version of a TermQuery?

Regards
Raghu


On Tue, Dec 15, 2009 at 12:45 PM, Raghuveer Kancherla 
raghuveer.kanche...@aplopio.com wrote:

 Hi,
 Thanks everyone for the responses, I am now able to get both phrase queries
 and term queries to use payloads.

 However the the score value for each document (and consequently, the
 ordering of documents) are coming out wrong.

 In the solr output appended below, document 4 has a score higher than the
 document 2 (look at the debug part). The results section shows a wrong score
 (which is the payload value I am returning from my custom similarity class)
 and the ordering is also wrong because of this. Can someone explain this ?

 My custom query parser is pasted here http://pastebin.com/m9f21565

 In the similarity class, I return 10.0 if payload is 1 and 20.0 if payload
 is 2. For everything else I return 1.0.

 {
  'responseHeader':{
   'status':0,
   'QTime':2,
   'params':{
   'fl':'*,score',
   'debugQuery':'on',
   'indent':'on',


   'start':'0',
   'q':'solr',
   'qt':'aplopio',
   'wt':'python',
   'fq':'',
   'rows':'10'}},
  'response':{'numFound':5,'start':0,'maxScore':20.0,'docs':[


   {
'payloadTest':'solr|2 rocks|1',
'id':'2',
'score':20.0},
   {
'payloadTest':'solr|2',
'id':'4',
'score':20.0},


   {
'payloadTest':'solr|1 rocks|2',
'id':'1',
'score':10.0},
   {
'payloadTest':'solr|1 rocks|1',
'id':'3',
'score':10.0},


   {
'payloadTest':'solr',
'id':'5',
'score':1.0}]
  },
  'debug':{
   'rawquerystring':'solr',
   'querystring':'solr',


   'parsedquery':'PayloadTermQuery(payloadTest:solr)',
   'parsedquery_toString':'payloadTest:solr',
   'explain':{
   '2':'\n7.227325 = (MATCH) fieldWeight(payloadTest:solr in 1), product 
 of:\n  14.142136 = (MATCH) btq, product of:\n0.70710677 = 
 tf(phraseFreq=0.5)\n20.0 = scorePayload(...)\n  0.81767845 = 
 idf(payloadTest:  solr=5)\n  0.625 = fieldNorm(field=payloadTest, doc=1)\n',


   '4':'\n11.56372 = (MATCH) fieldWeight(payloadTest:solr in 3), product 
 of:\n  14.142136 = (MATCH) btq, product of:\n0.70710677 = 
 tf(phraseFreq=0.5)\n20.0 = scorePayload(...)\n  0.81767845 = 
 idf(payloadTest:  solr=5)\n  1.0 = fieldNorm(field=payloadTest, doc=3)\n',


   '1':'\n3.6136625 = (MATCH) fieldWeight(payloadTest:solr in 0), product 
 of:\n  7.071068 = (MATCH) btq, product of:\n0.70710677 = 
 tf(phraseFreq=0.5)\n10.0 = scorePayload(...)\n  0.81767845 = 
 idf(payloadTest:  solr=5)\n  0.625 = fieldNorm(field=payloadTest, doc=0)\n',


   '3':'\n3.6136625 = (MATCH) fieldWeight(payloadTest:solr in 2), product 
 of:\n  7.071068 = (MATCH) btq, product of:\n0.70710677 = 
 tf(phraseFreq=0.5)\n10.0 = scorePayload(...)\n  0.81767845 = 
 idf(payloadTest:  solr=5)\n  0.625 = fieldNorm(field=payloadTest, doc=2)\n',


   '5':'\n0.578186 = (MATCH) fieldWeight(payloadTest:solr in 4), product 
 of:\n  0.70710677 = (MATCH) btq, product of:\n0.70710677 = 
 tf(phraseFreq=0.5)\n1.0 = scorePayload(...)\n  0.81767845 = 
 idf(payloadTest:  solr=5)\n  1.0 = fieldNorm(field=payloadTest, doc=4)\n'},


   'QParser':'BoostingTermQParser',
   'filter_queries':[''],
   'parsed_filter_queries':[],
   'timing':{
   'time':2.0,
   'prepare':{
'time':1.0,


'org.apache.solr.handler.component.QueryComponent':{
 'time':1.0},
'org.apache.solr.handler.component.FacetComponent':{
 'time':0.0},
'org.apache.solr.handler.component.MoreLikeThisComponent':{


 'time':0.0},
'org.apache.solr.handler.component.HighlightComponent':{
 'time':0.0},
'org.apache.solr.handler.component.StatsComponent':{
 'time':0.0},
'org.apache.solr.handler.component.DebugComponent':{


 'time':0.0}},
   'process':{
'time':1.0,
'org.apache.solr.handler.component.QueryComponent':{
 'time':0.0},
'org.apache.solr.handler.component.FacetComponent':{


 'time':0.0},
'org.apache.solr.handler.component.MoreLikeThisComponent':{
 'time':0.0},
'org.apache.solr.handler.component.HighlightComponent':{
 'time':0.0},


'org.apache.solr.handler.component.StatsComponent':{
 'time':0.0},
'org.apache.solr.handler.component.DebugComponent':{
 'time':1.0}












 On Thu, Dec 

search in all fields for multiple values?

2009-12-15 Thread Faire Mii

i have two fields:

title
body

and i want to search for two words

dog
OR
cat

in each of them.

i have tried q=*:dog OR cat

but it doesnt work.

how should i type it?

PS. could i enter default search field = ALL fields in schema.xml in  
someway?


Re: search in all fields for multiple values?

2009-12-15 Thread Shalin Shekhar Mangar
On Tue, Dec 15, 2009 at 5:35 PM, Faire Mii faire@gmail.com wrote:

 i have two fields:

 title
 body

 and i want to search for two words

 dog
 OR
 cat

 in each of them.

 i have tried q=*:dog OR cat

 but it doesnt work.

 how should i type it?

 PS. could i enter default search field = ALL fields in schema.xml in
 someway?


See
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_search_for_.22superman.22_in_both_the_title_and_subject_fields

You can also create a copyField to which you can copy both title and body
and specify that as the default search field.

-- 
Regards,
Shalin Shekhar Mangar.


solr php client vs file_get_contents?

2009-12-15 Thread Faire Mii

i am using php to access solr and i wonder one thing.

why should i use solr php client when i can use

$serializedResult = file_get_contents('http://localhost:8983/solr/ 
select?q=niklaswt=phps');


to get the result in arrays and then print them out?

i dont really get the difference. is there any richer features with  
the php client?



regards

fayer

Re: solr php client vs file_get_contents?

2009-12-15 Thread Israel Ekpo
On Tue, Dec 15, 2009 at 8:49 AM, Faire Mii faire@gmail.com wrote:

 i am using php to access solr and i wonder one thing.

 why should i use solr php client when i can use

 $serializedResult = file_get_contents('http://localhost:8983/solr/
 select?q=niklaswt=phps');

 to get the result in arrays and then print them out?

 i dont really get the difference. is there any richer features with the php
 client?


 regards

 fayer



Hi Faire,

Have you actually used this library before? I think the library is pretty
well thought out.

From a simple glance at the source code you can see that one can use it for
the following purposes:

1. Adding documents to the index (which you cannot just do with
file_get_contents alone). So that's one diff

2. Updating existing documents

3. Deleting existing documents.

4. Balancing requests across multiple backend servers

There are other operations with the Solr server that the library can also
perform.

Some example of what I am referring to is illustrated here

http://code.google.com/p/solr-php-client/wiki/FAQ

http://code.google.com/p/solr-php-client/wiki/ExampleUsage

IBM also has an interesting article illustrating how to add documents to the
Solr index and issue commit and optimize calls using this library.

http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/

The author of the library can probably give you more details on what the
library has to offer.

I think you should download the source code and spend some time looking at
all the features it has to offer.

In my opinion, it is not fair to compare a well thought out library like
that with a simple php function.
-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: solr php client vs file_get_contents?

2009-12-15 Thread Donovan Jimenez
In the end, the PHP client does a file_get_contents for doing a  
search the same way you'd do it manually.  It's all PHP, so you can  
do anything it does yourself. It provides what any library of PHP  
classes should - convenience. I use the JSON response writer because  
it gets the most attention from the Solr community of all the non-XML  
writers, yet is still very quick to parse (you might want to do your  
own tests comparing the speed of unserializing a Solr phps response  
versus json_decode'ing the json version).


Happy Solr'ing,
- Donovan

On Dec 15, 2009, at 8:49 AM, Faire Mii wrote:


i am using php to access solr and i wonder one thing.

why should i use solr php client when i can use

$serializedResult = file_get_contents('http://localhost:8983/solr/ 
select?q=niklaswt=phps');


to get the result in arrays and then print them out?

i dont really get the difference. is there any richer features with  
the php client?



regards

fayer




Re: Payloads with Phrase queries

2009-12-15 Thread Bill Au
Lucene 2.9.1 comes with a PayloadTermQuery:
http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/search/payloads/PayloadTermQuery.html

I have been using that to use the payload as part of the score without any
problem.

Bill


On Tue, Dec 15, 2009 at 6:31 AM, Raghuveer Kancherla 
raghuveer.kanche...@aplopio.com wrote:

 The interesting thing I am noticing is that the scoring works fine for a
 phrase query like solr rocks.
 This lead me to look at what query I am using in case of a single term.
 Turns out that I am using PayloadTermQuery taking a cue from solr-1485
 patch.

 I changed this to BoostingTermQuery (i read somewhere that this is
 deprecated .. but i was just experimenting) and the scoring seems to work
 as
 expected now for a single term.

 Now, the important question is what is the Payload version of a TermQuery?

 Regards
 Raghu


 On Tue, Dec 15, 2009 at 12:45 PM, Raghuveer Kancherla 
 raghuveer.kanche...@aplopio.com wrote:

  Hi,
  Thanks everyone for the responses, I am now able to get both phrase
 queries
  and term queries to use payloads.
 
  However the the score value for each document (and consequently, the
  ordering of documents) are coming out wrong.
 
  In the solr output appended below, document 4 has a score higher than the
  document 2 (look at the debug part). The results section shows a wrong
 score
  (which is the payload value I am returning from my custom similarity
 class)
  and the ordering is also wrong because of this. Can someone explain this
 ?
 
  My custom query parser is pasted here http://pastebin.com/m9f21565
 
  In the similarity class, I return 10.0 if payload is 1 and 20.0 if
 payload
  is 2. For everything else I return 1.0.
 
  {
   'responseHeader':{
'status':0,
'QTime':2,
'params':{
'fl':'*,score',
'debugQuery':'on',
'indent':'on',
 
 
'start':'0',
'q':'solr',
'qt':'aplopio',
'wt':'python',
'fq':'',
'rows':'10'}},
   'response':{'numFound':5,'start':0,'maxScore':20.0,'docs':[
 
 
{
 'payloadTest':'solr|2 rocks|1',
 'id':'2',
 'score':20.0},
{
 'payloadTest':'solr|2',
 'id':'4',
 'score':20.0},
 
 
{
 'payloadTest':'solr|1 rocks|2',
 'id':'1',
 'score':10.0},
{
 'payloadTest':'solr|1 rocks|1',
 'id':'3',
 'score':10.0},
 
 
{
 'payloadTest':'solr',
 'id':'5',
 'score':1.0}]
   },
   'debug':{
'rawquerystring':'solr',
'querystring':'solr',
 
 
'parsedquery':'PayloadTermQuery(payloadTest:solr)',
'parsedquery_toString':'payloadTest:solr',
'explain':{
'2':'\n7.227325 = (MATCH) fieldWeight(payloadTest:solr in 1),
 product of:\n  14.142136 = (MATCH) btq, product of:\n0.70710677 =
 tf(phraseFreq=0.5)\n20.0 = scorePayload(...)\n  0.81767845 =
 idf(payloadTest:  solr=5)\n  0.625 = fieldNorm(field=payloadTest, doc=1)\n',
 
 
'4':'\n11.56372 = (MATCH) fieldWeight(payloadTest:solr in 3),
 product of:\n  14.142136 = (MATCH) btq, product of:\n0.70710677 =
 tf(phraseFreq=0.5)\n20.0 = scorePayload(...)\n  0.81767845 =
 idf(payloadTest:  solr=5)\n  1.0 = fieldNorm(field=payloadTest, doc=3)\n',
 
 
'1':'\n3.6136625 = (MATCH) fieldWeight(payloadTest:solr in 0),
 product of:\n  7.071068 = (MATCH) btq, product of:\n0.70710677 =
 tf(phraseFreq=0.5)\n10.0 = scorePayload(...)\n  0.81767845 =
 idf(payloadTest:  solr=5)\n  0.625 = fieldNorm(field=payloadTest, doc=0)\n',
 
 
'3':'\n3.6136625 = (MATCH) fieldWeight(payloadTest:solr in 2),
 product of:\n  7.071068 = (MATCH) btq, product of:\n0.70710677 =
 tf(phraseFreq=0.5)\n10.0 = scorePayload(...)\n  0.81767845 =
 idf(payloadTest:  solr=5)\n  0.625 = fieldNorm(field=payloadTest, doc=2)\n',
 
 
'5':'\n0.578186 = (MATCH) fieldWeight(payloadTest:solr in 4),
 product of:\n  0.70710677 = (MATCH) btq, product of:\n0.70710677 =
 tf(phraseFreq=0.5)\n1.0 = scorePayload(...)\n  0.81767845 =
 idf(payloadTest:  solr=5)\n  1.0 = fieldNorm(field=payloadTest, doc=4)\n'},
 
 
'QParser':'BoostingTermQParser',
'filter_queries':[''],
'parsed_filter_queries':[],
'timing':{
'time':2.0,
'prepare':{
 'time':1.0,
 
 
 'org.apache.solr.handler.component.QueryComponent':{
  'time':1.0},
 'org.apache.solr.handler.component.FacetComponent':{
  'time':0.0},
 'org.apache.solr.handler.component.MoreLikeThisComponent':{
 
 
  'time':0.0},
 'org.apache.solr.handler.component.HighlightComponent':{
  'time':0.0},
 'org.apache.solr.handler.component.StatsComponent':{
  'time':0.0},
 'org.apache.solr.handler.component.DebugComponent':{
 
 
  'time':0.0}},
'process':{
 'time':1.0,
 'org.apache.solr.handler.component.QueryComponent':{
  'time':0.0},
 

Re: Document model suggestion

2009-12-15 Thread caman

Shalin,

Thanks. much appreciated.
Question about: 
 That is usually what people do. The hard part is when some documents are
shared across multiple users. 

What do you recommend when documents has to be shared across multiple users?
Can't I just multivalue a field with all the users who has access to the
document?


thanks

Shalin Shekhar Mangar wrote:
 
 On Tue, Dec 15, 2009 at 7:26 AM, caman
 aboxfortheotherst...@gmail.comwrote:
 

 Appreciate any guidance here please. Have a master-child table between
 two
 tables 'TA' and 'TB' where form is the master table. Any row in TA can
 have
 multiple row in TB.
 e.g. row in TA

 id---name
 1---tweets

 TB:
 id|ta_id|field0|field1|field2.|field20|created_by
 1|1|value1|value2|value2.|value20|User1

 snip/
 

 This works fine and index the data.But all the data for a row in TA gets
 combined in one document(not desirable).
 I am not clear on how to

 1) separate a particular row from the search results.
 e.g. If I search for 'Android' and there are 5 rows for android in TB for
 a
 particular instance in TA, would like to show them separately to user and
 if
 the user click on any of the row,point them to an attached URL in the
 application. Should a separate index be maintained for each row in TB?TB
 can
 have millions of rows.

 
 The easy answer is that whatever you want to show as results should be the
 thing that you index as documents. So if you want to show tweets as
 results,
 one document should represent one tweet.
 
 Solr is different from relational databases and you should not think about
 both the same way. De-normalization is the way to go in Solr.
 
 
 2) How to protect one user's data from another user. I guess I can keep a
 column for a user_id in the schema and append that filter automatically
 when
 I search through SOLR. Any better alternatives?


 That is usually what people do. The hard part is when some documents are
 shared across multiple users.
 
 
 Bear with me if these are newbie questions please, this is my first day
 with
 SOLR.


 No problem. Welcome to Solr!
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Document model suggestion

2009-12-15 Thread Erick Erickson
Yes, that should work. One hard part is what happens if your
authorization model has groups, especially when membership
in those groups changes. Then you have to go in and update
all the affected docs.

FWIW
Erick

On Tue, Dec 15, 2009 at 12:24 PM, caman aboxfortheotherst...@gmail.comwrote:


 Shalin,

 Thanks. much appreciated.
 Question about:
  That is usually what people do. The hard part is when some documents are
 shared across multiple users. 

 What do you recommend when documents has to be shared across multiple
 users?
 Can't I just multivalue a field with all the users who has access to the
 document?


 thanks

 Shalin Shekhar Mangar wrote:
 
  On Tue, Dec 15, 2009 at 7:26 AM, caman
  aboxfortheotherst...@gmail.comwrote:
 
 
  Appreciate any guidance here please. Have a master-child table between
  two
  tables 'TA' and 'TB' where form is the master table. Any row in TA can
  have
  multiple row in TB.
  e.g. row in TA
 
  id---name
  1---tweets
 
  TB:
  id|ta_id|field0|field1|field2.|field20|created_by
  1|1|value1|value2|value2.|value20|User1
 
  snip/
 
 
  This works fine and index the data.But all the data for a row in TA gets
  combined in one document(not desirable).
  I am not clear on how to
 
  1) separate a particular row from the search results.
  e.g. If I search for 'Android' and there are 5 rows for android in TB
 for
  a
  particular instance in TA, would like to show them separately to user
 and
  if
  the user click on any of the row,point them to an attached URL in the
  application. Should a separate index be maintained for each row in TB?TB
  can
  have millions of rows.
 
 
  The easy answer is that whatever you want to show as results should be
 the
  thing that you index as documents. So if you want to show tweets as
  results,
  one document should represent one tweet.
 
  Solr is different from relational databases and you should not think
 about
  both the same way. De-normalization is the way to go in Solr.
 
 
  2) How to protect one user's data from another user. I guess I can keep
 a
  column for a user_id in the schema and append that filter automatically
  when
  I search through SOLR. Any better alternatives?
 
 
  That is usually what people do. The hard part is when some documents are
  shared across multiple users.
 
 
  Bear with me if these are newbie questions please, this is my first day
  with
  SOLR.
 
 
  No problem. Welcome to Solr!
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 

 --
 View this message in context:
 http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Document model suggestion

2009-12-15 Thread caman

Erick,
I know what you mean. 
Wonder if it is actually cleaner to keep the authorization  model out of
solr index and filter the data at client side based on the user access
rights. 
Thanks all for help.



Erick Erickson wrote:
 
 Yes, that should work. One hard part is what happens if your
 authorization model has groups, especially when membership
 in those groups changes. Then you have to go in and update
 all the affected docs.
 
 FWIW
 Erick
 
 On Tue, Dec 15, 2009 at 12:24 PM, caman
 aboxfortheotherst...@gmail.comwrote:
 

 Shalin,

 Thanks. much appreciated.
 Question about:
  That is usually what people do. The hard part is when some documents
 are
 shared across multiple users. 

 What do you recommend when documents has to be shared across multiple
 users?
 Can't I just multivalue a field with all the users who has access to the
 document?


 thanks

 Shalin Shekhar Mangar wrote:
 
  On Tue, Dec 15, 2009 at 7:26 AM, caman
  aboxfortheotherst...@gmail.comwrote:
 
 
  Appreciate any guidance here please. Have a master-child table between
  two
  tables 'TA' and 'TB' where form is the master table. Any row in TA can
  have
  multiple row in TB.
  e.g. row in TA
 
  id---name
  1---tweets
 
  TB:
  id|ta_id|field0|field1|field2.|field20|created_by
  1|1|value1|value2|value2.|value20|User1
 
  snip/
 
 
  This works fine and index the data.But all the data for a row in TA
 gets
  combined in one document(not desirable).
  I am not clear on how to
 
  1) separate a particular row from the search results.
  e.g. If I search for 'Android' and there are 5 rows for android in TB
 for
  a
  particular instance in TA, would like to show them separately to user
 and
  if
  the user click on any of the row,point them to an attached URL in the
  application. Should a separate index be maintained for each row in
 TB?TB
  can
  have millions of rows.
 
 
  The easy answer is that whatever you want to show as results should be
 the
  thing that you index as documents. So if you want to show tweets as
  results,
  one document should represent one tweet.
 
  Solr is different from relational databases and you should not think
 about
  both the same way. De-normalization is the way to go in Solr.
 
 
  2) How to protect one user's data from another user. I guess I can
 keep
 a
  column for a user_id in the schema and append that filter
 automatically
  when
  I search through SOLR. Any better alternatives?
 
 
  That is usually what people do. The hard part is when some documents
 are
  shared across multiple users.
 
 
  Bear with me if these are newbie questions please, this is my first
 day
  with
  SOLR.
 
 
  No problem. Welcome to Solr!
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 

 --
 View this message in context:
 http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://old.nabble.com/Document-model-suggestion-tp26784346p26799016.html
Sent from the Solr - User mailing list archive at Nabble.com.



Exception from Spellchecker

2009-12-15 Thread Rafael Pappert
Hello List,

I try to enable the spellchecker in my 1.4.0 solr (running with tomcat 6 on 
debian).
But I always get the following exception, when I try to open 
http://localhost:8080/spell?:


HTTP Status 500 - null java.lang.NullPointerException at 
java.io.StringReader.init(StringReader.java:33) at 
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197) at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at 
org.apache.solr.search.QParser.getQuery(QParser.java:131) at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) 
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) 
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at 
java.lang.Thread.run(Thread.java:619)

My Configuration looks like this:

solrconfig.xml

searchComponent name=spellcheck class=solr.SpellCheckComponent

  str name=queryAnalyzerFieldTypetextSpell/str
lst name=spellchecker
str name=namea_spell/str
str name=fielda_spell/str
str name=buildOnOptimzetrue/str
str name=buildOnCommitfalse/str
str name=spellcheckIndexDir./spellchecker/str
  /lst

/searchComponent

requestHandler name=/spell class=solr.SearchHandler lazy=true
lst name=defaults
  !-- omp = Only More Popular --
  str name=spellcheck.onlyMorePopularfalse/str
  !-- exr = Extended Results --
  str name=spellcheck.extendedResultsfalse/str
  !--  The number of suggestions to return --
  str name=spellcheck.count1/str
/lst
arr name=last-components
  strspellcheck/str
/arr
/requestHandler

schema.xml

fieldType name=textSpell class=solr.TextField positionIncrementGap=100 
stored=false multiValued=true
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
 filter class=solr.StandardFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/ 
 !-- filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/ --
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
 filter class=solr.StandardFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType
 
   ..

field name=a_spell type=textSpell /

I don't know what's wrong with the given configuration and the exception not 
really clear ;)
Can somebody give me a hint? Thank you in anticipation.

Best regards,
Rafael.



Re: Exception from Spellchecker

2009-12-15 Thread Sascha Szott

Hi Rafael,

Rafael Pappert wrote:

I try to enable the spellchecker in my 1.4.0 solr (running with tomcat 6 on 
debian).
But I always get the following exception, when I try to open 
http://localhost:8080/spell?:


The spellcheck=true pair is missing in your request. Try

http://localhost:8080/spell?q=spellcheck=true

-Sascha



facet.field problem in SolrParams to NamedList

2009-12-15 Thread Nestor Oviedo
Hi!
I wrote a subclass of DisMaxQParserPlugin to add a little filter for
processing the q param and generate a fq param.
Ej.: q=something field:value becomes q=something valuefq=field:value

To do this, in the createParser method, I apply a regular expression
to the qstr param to obtain the fq part, and then I do the following:

NamedListObject paramsList = params.toNamedList();
paramsList.add(CommonParams.FQ, generatedFilterQuery);
params = SolrParams.toSolrParams(paramsList);
req.setParams(params);

The problem is when I include two facet.field in the request. In the
results (facets section) it prints [Ljava.lang.String;@c77a748,
which is the result of a toString() over an String[] .

So, getting a little in deep in the code, I saw the method
SolrParams.toNameList() was saving the array correctly, but the method
SolrParams.toSolrParams(NamedList) was doing:
params.getVal(i).toString(). So, it always loses the array.

Something similar occurs with the methods SolrParams.toMap() and
SolrParams.toMultiMap().

Is this a bug ?

thanks.
Nestor


Re: Using lucenes custom filters in solr

2009-12-15 Thread pavan kumar donepudi
Hi All,

  I have a custom filter for lucene ,Can anyone help me how to use this
in SOLR.

Thanks in advance,
Pavan


Can solr web site have multiple versions of online API doc?

2009-12-15 Thread Teruhiko Kurosaka
Lucene keeps multiple versions of its API doc online at
http://lucene.apache.org/java/X_Y_Z/api/all/index.html
for version X.Y.Z.  I am finding this very useful when 
comparing different versions.  This is also good because
the javadoc comments that I write for my software can
reference the API comments of the exact version of
Lucene that I am using.

At Solr site, I can only find the API doc of the trunk
build.  I cannot find 1.3.0 API doc, for example.

Can Solr site also maintain the API docs for the past
stable versions ?

-kuro 

Re: Using lucenes custom filters in solr

2009-12-15 Thread AHMET ARSLAN

 Hi All,
 
       I have a custom filter for lucene ,Can
 anyone help me how to use this
 in SOLR.

http://wiki.apache.org/solr/SolrPlugins#Tokenizer_and_TokenFilter
http://wiki.apache.org/solr/SolrPlugins#Analyzer





synonyms

2009-12-15 Thread Peter A. Kirk
Hi



It appears that Solr reads a synonym list at startup from a text file.

Is it possible to alter this behaviour so that Solr obtains the synonym list 
from a database instead?



Thanks,

Peter



wildcard oddity

2009-12-15 Thread Joe Calderon
im trying to do a wild card search

q:item_title:(gets*)returns no results
q:item_title:(gets)returns results
q:item_title:(get*)returns results


seems like * at the end of a token is requiring a character, instead
of being 0 or more its acting like1 or more

the text im trying to match is The Gang Gets Extreme: Home Makeover Edition

the field uses the following analyzers

fieldType name=text_token class=solr.TextField
positionIncrementGap=100 omitNorms=false
  analyzer
charFilter class=solr.HTMLStripCharFilterFactory /
tokenizer class=solr.WhiteSpaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.ISOLatin1AccentFilterFactory /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=0 catenateAll=1
splitOnNumerics=0 splitOnCaseChange=0 stemEnglishPossessive=0 /
  /analyzer
/fieldType


is anybody else having similar problems?


best,
--joe


Re: Concurrent Merge Scheduler MaxThread Count

2009-12-15 Thread Chris Hostetter

: I'm having trouble getting Solr to use more than one thread during index 
: optimizations.  I have the following in my solrconfig.xml:
:  mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
: int name=maxThreadCount6/int
: /mergeScheduler

How many segments do you have?

I'm not an expert on segment merging, but i'm pretty sure the number of 
threads it will use is limited based on the number of segments -- so even 
though you say use up to 8 it only uses one if that's all htat it can 
use.



-Hoss



Re: Spellchecking - Is there a way to do this?

2009-12-15 Thread Chris Hostetter

: My first problem appears because I need suggestions inclusive when the
: expression has returned results. It's seems that only appear
: suggestions when there are no results. Is there a way to do so?

can you give us an example of what your queries look like?  with the 
example configs, i can get matches, as well as suggestions...


http://localhost:8983/solr/spell?q=idespellcheck=true

: The second question is: For the purposes that I've mentioned, is the
: best way to use spellchecker or mlt component? Or some other (as a
: fuzzy query)?

there's no clear cut answer to that -- i don't remember anyone else ever 
asking about anything particularly similar to what you're doing, so i 
don't know that there is any precident for a best way to go about it.



-Hoss



Filter exclusion on query facets?

2009-12-15 Thread Mat Brown
Hi all,

Just wondering if it's possible to do filter exclusion (i.e.,
multiselect faceting) on query facets in Solr 1.4?

Thanks!
Mat


Re: Converting java date to solr date and querying dates

2009-12-15 Thread Chris Hostetter

:   i want to store dates into a date field called publish date in solr.
: how do we do it using solrj

I'm pretty sure that when indexing docs, you can add Date objects directly 
to the SolrInputDocument as field values -- but 'm not 100% certain (i 
don't use SolrJ much)

:   likewise how do we query from solr using java date? do we always have
: to convert it into UTC field and then query it?

all of the query APIs are based on query strings -- so yes you need to 
construct hte query string on your client side, and yes that includes 
formatting in UTC.

:   How do i query solr for documents published on monday or for documents
: published on March etc.

if you mean march of any year or any monday ever then there isn't any 
built in support for anything like that ... your best bet would either be 
to add month_of_year and day_of_week fields and populate them in your 
client code, or write an UpdateProcessor to run in solr (that could be 
pretty generic if you wnat ot contribute it back, other people could find 
it useful)

If you mean published in the most recent march or published on hte 
most recent monday where you don't have to change anything to have the 
query do what i mean as time moves on then you'd either need to do that 
when building up your query, or write it as a QParser plugin.

:   or in that case even apply range queries on it??

basic range queries are easy...

http://wiki.apache.org/solr/SolrQuerySyntax
http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html



-Hoss



Re: Reverse sort facet query

2009-12-15 Thread Chris Hostetter

: Does anyone know of a good way to perform a reverse-sorted facet query (i.e. 
rarest first)?

I'm fairly confident that code doesn't exist at the moment.  

If i remember correctly, it would be fairly simply to implement if you'd 
like to submit a patch:  when sorting by count a simple bounded priority 
queue is used, so we'd just have the change the comparator.  If you're 
interested in working on a patch it should be in SimpleFacets.java.  I 
think the queue is called BoundedTreeSet


(that's a pretty novel request actually ... i don't remember anyone else 
ever asking for anything like this before .. can you describe your use 
case a bit  -- i'm curious as to how/when you would use this data)



-Hoss



Re: Can solr web site have multiple versions of online API doc?

2009-12-15 Thread Israel Ekpo
2009/12/15 Teruhiko Kurosaka k...@basistech.com

 Lucene keeps multiple versions of its API doc online at
 http://lucene.apache.org/java/X_Y_Z/api/all/index.html
 for version X.Y.Z.  I am finding this very useful when
 comparing different versions.  This is also good because
 the javadoc comments that I write for my software can
 reference the API comments of the exact version of
 Lucene that I am using.

 At Solr site, I can only find the API doc of the trunk
 build.  I cannot find 1.3.0 API doc, for example.

 Can Solr site also maintain the API docs for the past
 stable versions ?

 -kuro


Hi Teruhiko

If you downloaded the 1.3.0 release, you should find a docs folder inside
the zip file.

This contains the javadoc for that particular release.

You may also re download a 1.3.0 release to get the docs for Solr 1.3.

I hope this helps.

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Log of zero result searches

2009-12-15 Thread Chris Hostetter

: Subject: Log of zero result searches
: References: 26747482.p...@talk.nabble.com 26748588.p...@talk.nabble.com
:  359a9283091203m73b4dc9ya51aa97e460b3...@mail.gmail.com
:  26756663.p...@talk.nabble.com 26776651.p...@talk.nabble.com
:  359a92830912141657r79881e4bg3a4370d81ea7e...@mail.gmail.com
: In-Reply-To: 359a92830912141657r79881e4bg3a4370d81ea7e...@mail.gmail.com


http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking




-Hoss



RE: Request Assistance with DIH

2009-12-15 Thread Turner, Robbin J
Thanks for the reply, just what I was looking for in answer.  I am running 
under Tomcat 6 on Solaris 10, the person that replied before you looks like 
they running under jetty.  I have configured jndi context.  I stop and start 
tomcat using the Solaris SMF, equivalent to services in linux.  But my cwd is 
point to root, I have solr home specified in Catalina/localhost/solr.xml.  Is 
there anything else that I can do to force cwd to point to solr/home?

Thanks again
Robbin

-Original Message-
From: Ken Lane (kenlane) [mailto:kenl...@cisco.com] 
Sent: Monday, December 14, 2009 11:04 AM
To: solr-user@lucene.apache.org
Subject: RE: Request Assistance with DIH

Hi Robbin,

I just went through this myself (I am a newbie).

The key things to look at are: 

1. Your data_config.xml. I created a table called 'foo' and an 
ora_data_config.xml file with a simple example to get it working that looks 
like this:

dataConfig
  dataSource type=JdbcDataSource 
  driver=oracle.jdbc.driver.OracleDriver
  url=jdbc:oracle:thin:@servername.cisco.com:1521:sidname
  user=scott 
  password=tiger/
  document
entity name=user 
query=SELECT id, username from foo
/entity
  /document
/dataConfig

Some gotcha's: 
If your Oracle DB is configured with Service_name rather than SID (ie.
You may be running failover, RAC, etc), the url parameter of jdbc connection 
can read like this:

  url=jdbc:oracle:thin:@(DESCRIPTION = (LOAD_BALANCE = on) 
(FAILOVER = on) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 
your_server_name_here.cisco.com)(PORT = 1528))) (CONNECT_DATA = (SERVICE_NAME 
= your_service_name.your domain.COM)))

2. In your solr_config.xml file, add something like this to reference the above 
listed file:

  requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
  lst name=defaults
str name=configora-data-config.xml/str
  /lst
  /requestHandler

3. I have Solr1.4 running under Tomcat 6. It looks like you are trying the 
jetty example, but pay mind to getting the cwd pointing to your solr home by 
setting your JNDI path as described in the dataimporthandler wiki.

4. When it blows up, as it did numerous times for me until I got it right, 
check the logs. As I am running under Tomcat, I was able to check 
tomcat_home\logs\catalina.2009-12-14.log to view DIH errors both upon restart 
of Tomcat and after running the DIH.


5. There are some tools to check your JDBC connection you might try before 
pulling too much of your hair out. Try here:
http://otn.oracle.com/sample_code/tech/java/sqlj_jdbc/content.html

Good Luck!
Ken

-Original Message-
From: Turner, Robbin J [mailto:robbin.j.tur...@boeing.com]
Sent: Monday, December 14, 2009 10:27 AM
To: solr-user@lucene.apache.org
Subject: RE: Request Assistance with DIH

How does this help answer my question?  I am trying to use the 
DATAImportHandler Development console.  The url you suggest assumes I had it 
working already.  

Looking at my logs and the response to the Development console, it does not 
appear that the connection to Oracle is being made.

So if someone could offer some configuration/connection setup directions I 
would very much appreciate it.

Thanks
Robbin 

-Original Message-
From: Joel Nylund [mailto:jnyl...@yahoo.com]
Sent: Friday, December 11, 2009 8:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Request Assistance with DIH

add ?command=full-import to your url

http://localhost:8983/solr/dataimport?command=full-import

thanks
Joel

On Dec 11, 2009, at 7:45 PM, Robbin wrote:

 I've been trying to use the DIH with oracle and would love it if 
 someone could give me some pointers.  I put the ojdbc14.jar in both 
 the Tomcat lib and solr home/lib.  I created a dataimport.xml and 
 enabled it in the solrconfig.xml.  I go to the http://solr server/ 
 solr/admin/dataimport.jsp.  This all seems to be fine, but I get the 
 default page response and doesn't look like the connection to the 
 oracle server is even attempted.

 I'm using the Solr 1.4 release on Nov 10.
 Do I need an oracle client on the server?  I thought having the ojdbc 
 jar should be sufficient.  Any help or configuration examples for 
 setting this up would be much appreciated.

 Thanks
 Robbin



Re: facet.field problem in SolrParams to NamedList

2009-12-15 Thread Chris Hostetter

: Ej.: q=something field:value becomes q=something valuefq=field:value
: 
: To do this, in the createParser method, I apply a regular expression
: to the qstr param to obtain the fq part, and then I do the following:
: 
: NamedListObject paramsList = params.toNamedList();
: paramsList.add(CommonParams.FQ, generatedFilterQuery);
: params = SolrParams.toSolrParams(paramsList);
: req.setParams(params);
...
: SolrParams.toNameList() was saving the array correctly, but the method
: SolrParams.toSolrParams(NamedList) was doing:
: params.getVal(i).toString(). So, it always loses the array.

I'm having trouble thinking through exactly where the problem is being 
introduced here ... ultimately what it comes down to is that the NamedList 
souldn't be containing a String[] ... it should be containing multiple 
string values with the same name (fq)

It would be good to make sure all of these methods play nicely with one 
another so some round trip conversions worked as expected -- so if you 
could open a bug for this with a simple example test case that would be 
great, ...but...

for your purposes, i would skip the NamedList conversion alltogether, 
and just use AppendedSolrParams...

  MapSolrParams myNewParams = new MapSolrParams();
  myNewParams.getMap().put(fq, generatedFileterQuery);
  myNewParams.getMap().put(q, generatedQueryString);
  req.setParams(new AppendedSolrParams(myNewParams, originalPrams));

-Hoss



Re: Log of zero result searches

2009-12-15 Thread stuart yeates

Chris Hostetter wrote:

See Also:  http://en.wikipedia.org/wiki/Thread_hijacking


You may want to update that link, since that wikipedia page has been 
deleted for some time.


cheers
stuart
--
Stuart Yeates
http://www.nzetc.org/   New Zealand Electronic Text Centre
http://researcharchive.vuw.ac.nz/ Institutional Repository


Re: wildcard oddity

2009-12-15 Thread Erick Erickson
Do you get the same behavior if you search for gang instead of gets?
I'm wondering if there's something going on with stemEnglishPossesive.

According to the docs you *should* be OK since you set
setmEnglishPosessive=0,
but this would help point in the right direction.

Also, am I correct in assuming that that is the analyzer both for indexing
AND searching?

Best
Erick

On Tue, Dec 15, 2009 at 3:30 PM, Joe Calderon calderon@gmail.comwrote:

 im trying to do a wild card search

 q:item_title:(gets*)returns no results
 q:item_title:(gets)returns results
 q:item_title:(get*)returns results


 seems like * at the end of a token is requiring a character, instead
 of being 0 or more its acting like1 or more

 the text im trying to match is The Gang Gets Extreme: Home Makeover
 Edition

 the field uses the following analyzers

fieldType name=text_token class=solr.TextField
 positionIncrementGap=100 omitNorms=false
  analyzer
charFilter class=solr.HTMLStripCharFilterFactory /
tokenizer class=solr.WhiteSpaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.ISOLatin1AccentFilterFactory /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=0 catenateAll=1
 splitOnNumerics=0 splitOnCaseChange=0 stemEnglishPossessive=0 /
  /analyzer
/fieldType


 is anybody else having similar problems?


 best,
 --joe



Solr client query vs Solr search query

2009-12-15 Thread insaneyogi3008

Hello,

I had this question to ask regarding building of a Solr query . On the solr
server running on a linux box my query that returns results is as follows ,
this one of course returns the search results 

http://ncbu-cam35-2:17003/apache-solr-1.4.0/profile/select/?q=Bangaloreversion=2.2start=0rows=10indent=on


However when I try to access the same Solr server using a webapp on tomcat
if I print out the query it comes out as : 

http://ncbu-cam35-2:17003/apache-solr-1.4.0/profile?q=bangaloreqt=/profilerows=100wt=javabinversion=1

Note the second query is missing the select clause among other things that
follow . This one does not return the results back to me . 

My question is am I building my query wrong in my client , could somebody
show me the way?

With Regards
Sri
-- 
View this message in context: 
http://old.nabble.com/Solr-client-query-vs-Solr-search-query-tp26802513p26802513.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr client query vs Solr search query

2009-12-15 Thread insaneyogi3008

Hello,

I had this question to ask regarding building of a Solr query . On the solr
server running on a linux box my query that returns results is as follows ,
this one of course returns the search results 

http://ncbu-cam35-2:17003/apache-solr-1.4.0/profile/select/?q=Bangaloreversion=2.2start=0rows=10indent=on


However when I try to access the same Solr server using a webapp on tomcat
if I print out the query it comes out as : 

http://ncbu-cam35-2:17003/apache-solr-1.4.0/profile?q=bangaloreqt=/profilerows=100wt=javabinversion=1

Note the second query is missing the select clause among other things that
follow . This one does not return the results back to me . 

My question is am I building my query wrong in my client , could somebody
show me the way?

With Regards
Sri
-- 
View this message in context: 
http://old.nabble.com/Solr-client-query-vs-Solr-search-query-tp26802634p26802634.html
Sent from the Solr - User mailing list archive at Nabble.com.



store content only of documents

2009-12-15 Thread javaxmlsoapdev

I store document in a field content field defiend as follow in schema.xml
field name=content type=text indexed=true stored=true
multiValued=true/

and following in solrconfig.xml
requestHandler name=/update/extract
class=org.apache.solr.handler.extraction.ExtractingRequestHandler
lst name=defaults
  str name=map.contentcontent/str
  str name=defaultFieldcontent/str
/lst
  /requestHandler

I want to store only content into this field but it store other meta data
of a document e.g. Author, timestamp, document type etc. how can I ask
solr to store only body of document into this field and not other meta data?

Thanks,

-- 
View this message in context: 
http://old.nabble.com/store-content-only-of-documents-tp26803101p26803101.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Filter exclusion on query facets?

2009-12-15 Thread Uri Boness
Yes, you can tag filters using the new local params format and then 
explicitly exclude them when providing the facet fields. see: 
http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters


Cheers,
Uri

Mat Brown wrote:

Hi all,

Just wondering if it's possible to do filter exclusion (i.e.,
multiselect faceting) on query facets in Solr 1.4?

Thanks!
Mat

  


Re: Using facets to narrow results with multiword field

2009-12-15 Thread Chris Hostetter
: 
: I'm using facet.field=lbrand and do get good results for eg: Geomax, GeoMax,
: GEOMAX  all of them falls into geomax. But when I'm filtering I do get
: strange results:
: 
: brand:geomax  gives numFound=0
: lbrand:geomax  gives numFound=57 (GEOMAX, GeoMag, Geomag)
: 
: How should I redefine brand to let narrow work correctly?

I'm not sure i understand what it is that isn't working for you ... if you 
are faceting on lbrand then you should filter on lbrand as well ... 
your query for brand:geomax is probably failing because you don't 
actually have geomax as a value for any doc -- which is what you should 
expect, since you didn't use a LowercaseFilter.

correct?




-Hoss



RE: SolrPlugin Guidance

2009-12-15 Thread Chris Hostetter

: Our QParser plugin will perform queries against directory documents and
: return any file document that has the matching directory id(s).  So the
: plugin transforms the query to something like 
: 
: q:+(directory_id:4 directory:10) +directory_id:(4)
...
: Currently the parser plugin is doing the lookup queries via the standard
: request handler.  The problem with this approach is that the look up
: queries are going to be analyzed twice.  This only seems to be a problem

...you lost me there.  if you are taking part of the query, and using it 
to get directory ids, and then using those directory ids to build a new 
query, why are you ever passing the output from one query parser to 
another query parser?

You take the input string, you let the LuceneQParser parse it and use it 
to search against Directory documents, and then you iterate over hte 
results, and get an ID from them.  You should be using those IDs directly 
to build your new query.

Honestly: even if you were using those ids to build a query string, and 
then pass that string to hte analyzer, i don't see why stemming would 
cause any problems for you if the ids are numbers (like in your example)

-Hoss



Re: using q= , adding fq=

2009-12-15 Thread Chris Hostetter

:  1) adding something like:  q=cat_id:xxxfq=geo_id= would boost
:  performance?
: 
: 
: For the n  1 query, yes, adding filters should improve performance 
: assuming it is selective enough.  The tradeoff is memory.

You might even find that something like this is faster...

   q=*:*fq=cat_id:fq=geo_id:

...but it can vary based on circumstances (depends a lot on how many 
unique  and  values you have, and how big each of those sets are, 
and how big you make your filterCache)

:  2) we do find problems when we ask for a page=large offset!  ie: 
:  q=cat_id:xxx and geo_id:yyystart=544545
:  (note that we limit docs to 50 max per resultset).
:  When start is 500 or more, Qtime is =5 seconds while the avg qtime is
:  100 ms

FWIW: limiting the number of rows per request to 50, but not limiting the 
start doesn't make much sense -- the same amount of work is needed to 
handle start=0rows=5050 and start=5000rows=50.

There are very few use cases for allowing people to iterate through all 
the rows that also require sorting.


-Hoss



Text formatting lost

2009-12-15 Thread Mike Aymard

Hi,

I'm a newbie and have a question about the text that is stored and then 
returned from a query. The field in question is of type text, is indexed and 
stored. The original text included various blank lines (line feeds), but when 
the text field is returned as the result from a query, all of the blank lines 
and extra spaces have been removed. Since I am storing the content for the 
purpose of displaying, I need the original format to be preserved. Is this 
possible? I tried changing it to indexed=false and using a copyvalue to copy 
it to the general text field for indexing, but this didn't help.

Thanks!
Mike
  
_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/177141665/direct/01/