Re: How to avoid hardcoding masterUrl in slave solrconfig.xml?

2009-11-30 Thread Noble Paul നോബിള്‍ नोब्ळ्
remove the  section from your solrconfig. It should be fine

On Tue, Dec 1, 2009 at 6:59 AM, William Pierce  wrote:
> Hi, Joe:
>
> I tried with the "fetchIndex" all lower-cased, and still the same result.
> What do you specify for masterUrl in the solrconfig.xml on the slave?   it
> seems to me that if I remove the element,  I get the exception I wrote
> about.   If I set  it to some dummy url, then I get an invalid url message
> when I run the command=details on the slave replication handler.
>
> What I am doing does not look out of the ordinary.   I want to control the
> masterurl and the time of replication by myself.  As such I want neither the
> masterUrl nor the polling interval in the config file.  Can you share
> relevant snippets of your config file and the exact url your code is
> generating?
>
> Thanks,
>
> - Bill
>
> --
> From: "Joe Kessel" 
> Sent: Monday, November 30, 2009 3:45 PM
> To: 
> Subject: RE: How to avoid hardcoding masterUrl in slave solrconfig.xml?
>
>>
>> I do something very similar and it works for me.  I noticed on your URL
>> that you have a mixed case fetchIndex, which the request handler is checking
>> for fetchindex, all lowercase.  If it is not that simple I can try to see
>> the exact url my code is generating.
>>
>>
>>
>> Hope it helps,
>>
>> Joe
>>
>>> From: evalsi...@hotmail.com
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: How to avoid hardcoding masterUrl in slave solrconfig.xml?
>>> Date: Mon, 30 Nov 2009 13:48:38 -0800
>>>
>>> Folks:
>>>
>>> Sorry for this repost! It looks like this email went out twice
>>>
>>> Thanks,
>>>
>>> - Bill
>>>
>>> --
>>> From: "William Pierce" 
>>> Sent: Monday, November 30, 2009 1:47 PM
>>> To: 
>>> Subject: How to avoid hardcoding masterUrl in slave solrconfig.xml?
>>>
>>> > Folks:
>>> >
>>> > I do not want to hardcode the masterUrl in the solrconfig.xml of my >
>>> > slave.
>>> > If the masterUrl tag is missing from the config file, I am getting an
>>> > exception in solr saying that the masterUrl is required. So I set it to
>>> > some dummy value, comment out the poll interval element, and issue a
>>> > replication command manually like so:
>>> >
>>> >
>>> > http://localhost:port/postings/replication?command=fetchIndex&masterUrl=http://localhost:port/postingsmaster/replication
>>> >
>>> > Now no internal exception, solr responds with a status "OK" for the >
>>> > above
>>> > request, the tomcat logs show no error but the index is not replicated.
>>> > When I issue the details command to the slave, I see that it ignored >
>>> > the
>>> > masterUrl on the command line but instead complains that the master url
>>> > > in
>>> > the config file (which I had set to a dummy value) is not correct.
>>> >
>>> > (Just fyi, I have tried sending in the masterUrl to the above command >
>>> > with
>>> > url encoding and also without. in both cases, I got the same result.)
>>> >
>>> > Show exactly do I avoid hardcoding the masterUrl in the config
>>> > file? Any pointers/help will be greatly appreciated!
>>> >
>>> > - Bill
>>>
>>
>> _
>> Bing brings you maps, menus, and reviews organized in one place.
>>
>> http://www.bing.com/search?q=restaurants&form=MFESRP&publ=WLHMTAG&crea=TEXT_MFESRP_Local_MapsMenu_Resturants_1x1
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: HTML Stripping slower in Solr 1.4?

2009-11-30 Thread Koji Sekiguchi

Robin,

Thank you for reporting this. Performance degradation of HTML Stripper
could be in 1.4. I opened a ticket in Lucene:

https://issues.apache.org/jira/browse/LUCENE-2098

Koji

--
http://www.rondhuit.com/en/



Re: Phrase Query Issue

2009-11-30 Thread ravicv

Thanks Erick But their any way to get phrase search and normal search
together.

for Example if I search for "test" it should only give test result but not
tests..
And if we search for tests it should give all results matching with
tests(this should include stemmer results also)

Please clarify me how normally this scenario is implemented?

I am using field title with text type which contains
SnowballPorterFilterFactory filter.

And another field title_exact with out SnowballPorterFilterFactory filter.

Both are copied using copy field.

 
 
But as text type is having a SnowballPorterFilterFactory filter it stores
tests as test.So always if I search for "test" tests also will come in my
search result.
Please tell me a way to avoid it.

Thanks
Ravichandra
-- 
View this message in context: 
http://old.nabble.com/Phrase-Query-Issue-tp22863529p26586788.html
Sent from the Solr - User mailing list archive at Nabble.com.



HTML Stripping slower in Solr 1.4?

2009-11-30 Thread Robin Wojciki
Hello,

Our schema in Sol 1.3 looked like:




It takes 30s to index 1500 docs. When we run the same in Sol 1.4 it take 70s.

I noticed that HTMLStripStandardTokenizerFactory was deprecated. So
changed the schema to:




It still takes 70s.

Instead, if I use the schema:



It takes 30s in both 1.3 and 1.4.

I am not sure if HTMLStrip has become slower in 1.4 or HTML stripping
impacts perf down stream in 1.4. Before I started writing a unit test
with a TokenizerChain, I wanted to check if I am doing something
fundamentally wrong.

Robin


Re: *:* Returning no results

2009-11-30 Thread Erik Hatcher
Are you sure you're hitting the same core?  Did a commit?  Are you  
possibly using the dismax query parser (where *:* is fairly  
meaningless)?


Erik

On Nov 30, 2009, at 3:54 PM, Giovanni Fernandez-Kincade wrote:


Hi,
I created a brand new core (on Solr 1.4), added a few documents and  
then searched for *:*, but got no results. Strangely enough, if I  
search for a specific document I know is in the index, like say  
"versionId:3", I get the expected result.


Any ideas on why that might be?

Thank,
Gio.




Re: nested solr queries

2009-11-30 Thread Chris Hostetter

: thanks for your help so do you think I should execute solr queries twice ?
: or is there any other workarounds

http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss



Re: Sorting Facets by First Occurrence

2009-11-30 Thread Cory G Watson

On Nov 30, 2009, at 5:15 PM, Chris Hostetter wrote:
> All of Solr's existing faceting code is based on the DocSet which is an 
> unordered set of all matching documents -- i suspect your existing 
> application only reordered the facets based on their appearance in the 
> first N docs (possibly just the first page, but maybe more) so doing 
> something like that using the DocList would certainly be feasible.  if 
> your number of facet constraints is low enough that they are all returned 
> everytime then doing it in the client is probably the easiest -- but if 
> you have to worry about facet.limit preventing something from being 
> returned that might otherwise bubble to the top of your list when you 
> reorder it then you'll need to customise the FacetComponent.


You are right, I left out a few important bits there.  Tried to be brief and 
succeeding in being vague? :)

Effectively I was ordering the facet based on the N documents in the current 
"page".  My thought that his was a good feature for a facet now seems 
incorrect, as my needs are limited to the current page, not the whole set of 
results.

I'll probably elect to fetching data from the facets based on the page of 
documents I'm showing.  Thanks for the discussion, it helped! :)

Cory G Watson
http://www.onemogin.com





Re: How to avoid hardcoding masterUrl in slave solrconfig.xml?

2009-11-30 Thread William Pierce

Hi, Joe:

I tried with the "fetchIndex" all lower-cased, and still the same result. 
What do you specify for masterUrl in the solrconfig.xml on the slave?   it 
seems to me that if I remove the element,  I get the exception I wrote 
about.   If I set  it to some dummy url, then I get an invalid url message 
when I run the command=details on the slave replication handler.


What I am doing does not look out of the ordinary.   I want to control the 
masterurl and the time of replication by myself.  As such I want neither the 
masterUrl nor the polling interval in the config file.  Can you share 
relevant snippets of your config file and the exact url your code is 
generating?


Thanks,

- Bill

--
From: "Joe Kessel" 
Sent: Monday, November 30, 2009 3:45 PM
To: 
Subject: RE: How to avoid hardcoding masterUrl in slave solrconfig.xml?



I do something very similar and it works for me.  I noticed on your URL 
that you have a mixed case fetchIndex, which the request handler is 
checking for fetchindex, all lowercase.  If it is not that simple I can 
try to see the exact url my code is generating.




Hope it helps,

Joe


From: evalsi...@hotmail.com
To: solr-user@lucene.apache.org
Subject: Re: How to avoid hardcoding masterUrl in slave solrconfig.xml?
Date: Mon, 30 Nov 2009 13:48:38 -0800

Folks:

Sorry for this repost! It looks like this email went out twice

Thanks,

- Bill

--
From: "William Pierce" 
Sent: Monday, November 30, 2009 1:47 PM
To: 
Subject: How to avoid hardcoding masterUrl in slave solrconfig.xml?

> Folks:
>
> I do not want to hardcode the masterUrl in the solrconfig.xml of my 
> slave.

> If the masterUrl tag is missing from the config file, I am getting an
> exception in solr saying that the masterUrl is required. So I set it to
> some dummy value, comment out the poll interval element, and issue a
> replication command manually like so:
>
> 
http://localhost:port/postings/replication?command=fetchIndex&masterUrl=http://localhost:port/postingsmaster/replication
>
> Now no internal exception, solr responds with a status "OK" for the 
> above

> request, the tomcat logs show no error but the index is not replicated.
> When I issue the details command to the slave, I see that it ignored 
> the
> masterUrl on the command line but instead complains that the master url 
> in

> the config file (which I had set to a dummy value) is not correct.
>
> (Just fyi, I have tried sending in the masterUrl to the above command 
> with

> url encoding and also without. in both cases, I got the same result.)
>
> Show exactly do I avoid hardcoding the masterUrl in the config
> file? Any pointers/help will be greatly appreciated!
>
> - Bill



_
Bing brings you maps, menus, and reviews organized in one place.
http://www.bing.com/search?q=restaurants&form=MFESRP&publ=WLHMTAG&crea=TEXT_MFESRP_Local_MapsMenu_Resturants_1x1 




RE: How to avoid hardcoding masterUrl in slave solrconfig.xml?

2009-11-30 Thread Joe Kessel

I do something very similar and it works for me.  I noticed on your URL that 
you have a mixed case fetchIndex, which the request handler is checking for 
fetchindex, all lowercase.  If it is not that simple I can try to see the exact 
url my code is generating.

 

Hope it helps,

Joe
 
> From: evalsi...@hotmail.com
> To: solr-user@lucene.apache.org
> Subject: Re: How to avoid hardcoding masterUrl in slave solrconfig.xml?
> Date: Mon, 30 Nov 2009 13:48:38 -0800
> 
> Folks:
> 
> Sorry for this repost! It looks like this email went out twice
> 
> Thanks,
> 
> - Bill
> 
> --
> From: "William Pierce" 
> Sent: Monday, November 30, 2009 1:47 PM
> To: 
> Subject: How to avoid hardcoding masterUrl in slave solrconfig.xml?
> 
> > Folks:
> >
> > I do not want to hardcode the masterUrl in the solrconfig.xml of my slave. 
> > If the masterUrl tag is missing from the config file, I am getting an 
> > exception in solr saying that the masterUrl is required. So I set it to 
> > some dummy value, comment out the poll interval element, and issue a 
> > replication command manually like so:
> >
> > http://localhost:port/postings/replication?command=fetchIndex&masterUrl=http://localhost:port/postingsmaster/replication
> >
> > Now no internal exception, solr responds with a status "OK" for the above 
> > request, the tomcat logs show no error but the index is not replicated. 
> > When I issue the details command to the slave, I see that it ignored the 
> > masterUrl on the command line but instead complains that the master url in 
> > the config file (which I had set to a dummy value) is not correct.
> >
> > (Just fyi, I have tried sending in the masterUrl to the above command with 
> > url encoding and also without. in both cases, I got the same result.)
> >
> > Show exactly do I avoid hardcoding the masterUrl in the config 
> > file? Any pointers/help will be greatly appreciated!
> >
> > - Bill 
> 
  
_
Bing brings you maps, menus, and reviews organized in one place.
http://www.bing.com/search?q=restaurants&form=MFESRP&publ=WLHMTAG&crea=TEXT_MFESRP_Local_MapsMenu_Resturants_1x1

Re: Sorting Facets by First Occurrence

2009-11-30 Thread Chris Hostetter

: I'm working on replacing a custom, internal search implementation with
: Solr.  I'm having great success, with one small exception.
...
: For example, if a search yielded 10 results, 1 - 10, and hit 1 is in
: category 'Toys', hit 2 through 9 are in 'Sports' and the last is in
: 'Household' then the facet would look like:

...and what if you had 100,000 results?  did it really look at every doc 
that matched a query to decide the facet ordering, or did it stop at 10?

: So, the question I _really_ have is: how can I implement this feature?

It depends ... how was the previous version implemented?

(you mentioned it was a custom internal solution, so i assume you have 
access to the code and can explain it to us in psuedo code ... that would 
give people the best insight into what exactly it was doing to make a 
Solr based comparison)

:  I could examine the results i'm returned and create my own facet
: order from it, but I thought this might be useful for others.  I don't
: know my way around Solr's source, so I though dropping a note to the
: list would be faster than code spelunking with no light.

All of Solr's existing faceting code is based on the DocSet which is an 
unordered set of all matching documents -- i suspect your existing 
application only reordered the facets based on their appearance in the 
first N docs (possibly just the first page, but maybe more) so doing 
something like that using the DocList would certainly be feasible.  if 
your number of facet constraints is low enough that they are all returned 
everytime then doing it in the client is probably the easiest -- but if 
you have to worry about facet.limit preventing something from being 
returned that might otherwise bubble to the top of your list when you 
reorder it then you'll need to customise the FacetComponent.


-Hoss



RE: schema-based Index-time field boosting

2009-11-30 Thread Chris Hostetter

: I am talking about field boosting rather than document boosting, ie. I
: would like some fields (say eg. title) to be "louder" than others,
: across ALL documents.  I believe you are at least partially talking
: about document boosting, which clearly applies on a per-document basis.

index time boosts are all the same -- it doesn't matter if htey are field 
boosts or document boosts -- a document boost is just a field boost for 
every field in the document.

: If it helps, consider a schema version of the following, from
: org.apache.solr.common.SolrInputDocument:
: 
:   /**
:* Adds a field with the given name, value and boost.  If a field with
: the name already exists, then it is updated to
:* the new value and boost.
:*
:* @param name Name of the field to add
:* @param value Value of the field
:* @param boost Boost value for the field
:*/
:   public void addField(String name, Object value, float boost ) 

...

: Where a constant boost value is applied consistently to a given field.
: That is what I was mistakenly hoping to achieve in the schema.  I still
: think it would be a good idea BTW.  Regards,

But now we're right back to what i was trying to explain before: index 
time boost values like these are only used as a multiplier in the 
fieldNorm.  when included as part of the document data, you can specify a 
fieldBoost for fieldX of docA that's greater then the boost for fieldX 
of docB and that will make docA score higher then docB when fieldX 
contains the same number of matches and is hte same length -- but if you 
apply a constant boost of B to fieldX for every doc (which is what a 
feature to hardcode boosts in schema.xml might give you) then the net 
effect would be zero when scoring docA and docB, because the fieldNorm's 
for fieldX in both docs would include the exact same multiplier.



-Hoss



Re: schema-based Index-time field boosting

2009-11-30 Thread Chris Hostetter

: Coming in a bit late but I would like a variant that is not a No-OP.
: Think of something like title:searchstring^10 OR catch_all:searchstring
: Of course I can always add the boosting at query time but it would make
: life easier if I could define a default boost in the schema so that my
: query could just be title:searchstring OR catch_all:searchstring
: but still get the boost for the title field.

That would be a query time boost -- not an index time boost.  An index 
time boost is a very specific concept that refers to increaseing the 
fieldNorm of a field in a specific document.

What you are describing would be a query time boost, in which solr knows 
that certain query clauses should be worth more.  You can 
already do esentailly this exact same thing using the dismax parser, but i 
suppose it would also be possible to modify the LuceneQParser so that it 
could take in configuration that would tell it to apply different default 
query boosts to different fields.



-Hoss



Re: $DeleteDocbyQuery in solr 1.4 is not working

2009-11-30 Thread cpmoser

Ok, I think I figured out what might be happening.  It appears that the
DataImporter issues the commit command without the expungeDeletes option set
to true (default in 1.4 for a commit command is for expungeDeletes to be set
to false).  You can get around this by issuing the commit command manually:

curl
"http://localhost:8983/solr/dataimport?command=delta-import&commit=false&optimize=false";
curl "http://localhost:8983/solr/update"; --data-binary '' -H "Content-type:text/xml; charset=utf-8"

That fixed the issue for me, although ideally the DataImportHandler should
run with the expungeDeletes option set to true, so that this could all
happen in one command without having to wait for the DIH to finish before
issuing the commit command.  

Is there a way to set the expungeDeletes option for the commit command of
the DataImportHandler, so that delta imports with deletes can happen
automatically?


Mark.El wrote:
> 
> Thanks I will look into it!
> 
> 

-- 
View this message in context: 
http://old.nabble.com/%24DeleteDocbyQuery-in-solr-1.4-is-not-working-tp26376265p26583260.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to avoid hardcoding masterUrl in slave solrconfig.xml?

2009-11-30 Thread William Pierce

Folks:

Sorry for this repost!  It looks like this email went out twice

Thanks,

- Bill

--
From: "William Pierce" 
Sent: Monday, November 30, 2009 1:47 PM
To: 
Subject: How to avoid hardcoding masterUrl in slave solrconfig.xml?


Folks:

I do not want to hardcode the masterUrl in the solrconfig.xml of my slave. 
If the masterUrl tag is missing from the config file, I am getting an 
exception in solr saying that the masterUrl is required.  So I set it to 
some dummy value,  comment out the poll interval element,  and issue a 
replication command manually like so:


http://localhost:port/postings/replication?command=fetchIndex&masterUrl=http://localhost:port/postingsmaster/replication

Now no internal exception,  solr responds with a status "OK" for the above 
request,  the tomcat logs show no error but the index is not replicated. 
When I issue the details command to the slave,  I see that it ignored the 
masterUrl on the command line but instead complains that the master url in 
the config file (which I had set to a dummy value) is not correct.


(Just fyi, I have tried sending in the masterUrl to the above command with 
url encoding and also without.  in both cases, I got the same result.)


Show exactly do I avoid hardcoding the masterUrl in the config 
file?  Any pointers/help will be greatly appreciated!


- Bill 




How to avoid hardcoding masterUrl in slave solrconfig.xml?

2009-11-30 Thread William Pierce
Folks:

I do not want to hardcode the masterUrl in the solrconfig.xml of my slave.  If 
the masterUrl tag is missing from the config file, I am getting an exception in 
solr saying that the masterUrl is required.  So I set it to some dummy value,  
comment out the poll interval element,  and issue a replication command 
manually like so:

http://localhost:port/postings/replication?command=fetchIndex&masterUrl=http://localhost:port/postingsmaster/replication

Now no internal exception,  solr responds with a status "OK" for the above 
request,  the tomcat logs show no error but the index is not replicated.  When 
I issue the details command to the slave,  I see that it ignored the masterUrl 
on the command line but instead complains that the master url in the config 
file (which I had set to a dummy value) is not correct.

(Just fyi, I have tried sending in the masterUrl to the above command with url 
encoding and also without.  in both cases, I got the same result.)

Show exactly do I avoid hardcoding the masterUrl in the config file?  
Any pointers/help will be greatly appreciated!

- Bill

Sorting Facets by First Occurrence

2009-11-30 Thread Cory Watson
I'm working on replacing a custom, internal search implementation with
Solr.  I'm having great success, with one small exception.

When implementing our version of faceting, one of our facets had a
peculiar sort order.  It was dictated by the order in which the field
occurred in the results.  The first time a value occurred it was added
to the list and regardless of the number of times it occurred, it
always stayed at the top.

For example, if a search yielded 10 results, 1 - 10, and hit 1 is in
category 'Toys', hit 2 through 9 are in 'Sports' and the last is in
'Household' then the facet would look like:

facet.fields -> category -> [ Toys: 1, Sports: 8 Household: 1 ]

The facet.sort only gives me the option to sort highest count first or
alphabetically.

So, the question I _really_ have is: how can I implement this feature?
 I could examine the results i'm returned and create my own facet
order from it, but I thought this might be useful for others.  I don't
know my way around Solr's source, so I though dropping a note to the
list would be faster than code spelunking with no light.

-- 
Cory 'G' Watson
http://www.onemogin.com


Thought that masterUrl in slave solrconfig.xml is optional...

2009-11-30 Thread William Pierce
Folks:

Reading the wiki,  I saw the following statement:
  "Force a fetchindex on slave from master command : 
http://slave_host:port/solr/replication?command=fetchindex 

  It is possible to pass on extra attribute 'masterUrl' or other attributes 
like 'compression' (or any other parameter which is specified in the  tag) to do a one time replication from a master. This obviates 
the need for hardcoding the master in the slave. "
In my case, I cannot hardcode the masterurl in the config file.  I want a cron 
job to issue the replication commands for each of the slaves.

So I issued the following command:

http://localhost/postings/replication?command=fetchIndex&masterUrl=http%3a%2f%2flocalhost%2fpostingsmaster

I got the following exception:

HTTP Status 500 - Severe errors in solr configuration. Check your log files for 
more detailed information on what may be wrong. If you want solr to continue 
after configuration errors, change: 
false in null 
- 

org.apache.solr.common.SolrException: 'masterUrl' is required for a slave at 
org.apache.solr.handler.SnapPuller.(SnapPuller.java:126) at 

other lines removed

Why is error message asking me to specify the masterUrl in the config file when 
the wiki states that this is optional?   Or, am I understanding this 
incorrectly?

Thanks,

- Bill




RE: *:* Returning no results

2009-11-30 Thread Giovanni Fernandez-Kincade
Hmm. When I include debugQuery=on I get two results (which is accurate):


Otherwise I get 



Why would you get different results with debugging on? Does anything look 
peculiar here?



 *:*
 *:*

 MatchAllDocsQuery(*:*)
 *:*
 
  
1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm

  
1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm

 

 LuceneQParser
 
  93.0
  
0.0

 0.0



 0.0


 0.0



 0.0


 0.0


 0.0


  
  
93.0

 78.0



 0.0


 0.0


 0.0



 0.0


 15.0

  

 


-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org] 
Sent: Monday, November 30, 2009 4:02 PM
To: solr-user@lucene.apache.org
Cc: Giovanni Fernandez-Kincade
Subject: Re: *:* Returning no results

Add debugQuery=on to give you clues.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/


On Nov 30, 2009, at 3:54 PM, Giovanni Fernandez-Kincade wrote:

> Hi,
> I created a brand new core (on Solr 1.4), added a few documents and then 
> searched for *:*, but got no results. Strangely enough, if I search for a 
> specific document I know is in the index, like say "versionId:3", I get the 
> expected result.
> 
> Any ideas on why that might be?
> 
> Thank,
> Gio.







Re: *:* Returning no results

2009-11-30 Thread Smiley, David W.
Add debugQuery=on to give you clues.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/


On Nov 30, 2009, at 3:54 PM, Giovanni Fernandez-Kincade wrote:

> Hi,
> I created a brand new core (on Solr 1.4), added a few documents and then 
> searched for *:*, but got no results. Strangely enough, if I search for a 
> specific document I know is in the index, like say "versionId:3", I get the 
> expected result.
> 
> Any ideas on why that might be?
> 
> Thank,
> Gio.







*:* Returning no results

2009-11-30 Thread Giovanni Fernandez-Kincade
Hi,
I created a brand new core (on Solr 1.4), added a few documents and then 
searched for *:*, but got no results. Strangely enough, if I search for a 
specific document I know is in the index, like say "versionId:3", I get the 
expected result.

Any ideas on why that might be?

Thank,
Gio.


Online article: Text Search, your Database or Solr

2009-11-30 Thread Smiley, David W.
Hey folks.  I wrote a piece comparing Solr and database based text search.  
It's obviously pro-Solr ;-)  Of course if you're reading this then you're 
already drinking the cool-aid, so to speak, but you may find this article 
interesting.

http://www.packtpub.com/article/text-search-your-database-or-solr

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/






Re: Fulltext crawler

2009-11-30 Thread Smiley, David W.
Start reading midway page 224.

Additionally, you might want to get the online supplement available at 
packtpub.com.

FYI my co-author Eric Pugh wrote the last 3 chapters which includes this.

~ David

On Nov 30, 2009, at 1:37 PM, Jörg Agatz wrote:

> book? i order "Solr 1.4" today, i see some examples in this book?



Re: Fulltext crawler

2009-11-30 Thread Jörg Agatz
book? i order "Solr 1.4" today, i see some examples in this book?


Re: Multi-Term Synonyms

2009-11-30 Thread brad anderson
Thanks a lot Patrick. I think your solution may work for my needs as well.

-Brad

2009/11/26 Patrick Jungermann 

> Hi Brad,
>
> I was trying this, too, and there is a possibility how to get multi-term
> synonyms to work properly. I wrote my solution already on this list.
>
> My solution was as follows:
>
> [cite]
> after your hints that had partially confirmed my considerations, I had
> made some tests with the FieldQParser. At the beginning, I had have some
> problems, but finally, I was able to solve the problem of multi-word
> synonyms at query time in a way that is suitable for us - and possibly
> for others, too.
>
> At my solution, I re-used the FieldQParserPlugin. At first, I ported it
> to the new API (incrementToken instead of next, etc.) and then I
> modified the code so, that no PhraseQueries will be created but only
> BooleanQueries.
>
> Now with my new QParserPlugin that based on the FieldQParserPlugin, it's
> possible to search for things like "foo bar baz", where "foo bar" has to
> be changed to "foo_bar" and where at the end the tokens "foo_bar" und
> "baz" will be created, so that both could match independently.
> [/cite]
>
> Our current version is re-worked again, so that also multi-field queries
> are possible.
>
> If you want to use such a solution, you have probably to go without
> complex query parsing et cetera. I also have to write your own modified
> QParser, that fit your special needs. Also some higher features, like
> they are offered by other QParsers could be integrated. It's all up to
> you and your needs.
>
>
>
> Patrick
>
>
>
> brad anderson schrieb:
> > Thanks for the help. Can't believe I missed that part in the wiki.
> >
> > 2009/11/24 Tom Hill 
> >
> >> Hi Brad,
> >>
> >>
> >> I suspect that this section from the wiki for SynonymFilterFactory might
> be
> >> relevant:
> >>
> >>
> >>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> >>
> >> *"Keep in mind that while the SynonymFilter will happily work with
> synonyms
> >> containing multiple words (ie: "**sea biscuit, sea biscit,
> seabiscuit**")
> >> The recommended approach for dealing with synonyms like this, is to
> expand
> >> the synonym when indexing. This is because there are two potential
> issues
> >> that can arrise at query time:*
> >>
> >>   1.
> >>
> >>   *The Lucene QueryParser tokenizes on white space before giving any
> text
> >>   to the Analyzer, so if a person searches for the words **sea biscit**
> the
> >>   analyzer will be given the words "sea" and "biscit" seperately, and
> will
> >> not
> >>   know that they match a synonym."*
> >>
> >>   ...
> >>
> >> Tom
> >>
> >> On Tue, Nov 24, 2009 at 10:47 AM, brad anderson  >>> wrote:
> >>> Hi Folks,
> >>>
> >>> I was trying to get multi term synonyms to work. I'm experiencing some
> >>> strange behavior and would like some feedback.
> >>>
> >>> In the synonyms file I have the line:
> >>>
> >>> thomas, boll holly, thomas a, john q => tom
> >>>
> >>> And I have a document with the text field as;
> >>>
> >>> tom
> >>>
> >>> However, when I do a search on boll holly, it does not return the
> >> document
> >>> with tom. The same thing happens if I do a query on john q. But if I do
> a
> >>> query on thomas, it gives me the document. Also, if I quote "boll
> holly"
> >> or
> >>> "john q" it gives back the document.
> >>>
> >>> When I look at the analyzer page on the solr admin page, it is
> >> transforming
> >>> "boll holly" to "tom" when it isn't quoted. Why is it that it is not
> >>> returning the document? Is there some configuration I can make so it
> does
> >>> return the document if I do an unquoted search on "boll holly"?
> >>>
> >>> My synonym filter is defined as follows, and is only defined on the
> query
> >>> side:
> >>>
> >>>  >>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >>>
> >>>
> >>> I've also tried changing the synonym file to be
> >>>
> >>> tom, thomas, boll holly, thomas a, john q
> >>>
> >>> This produces the same results.
> >>>
> >>> Thanks,
> >>> Brad
> >>>
> >
>
>


RE: SolrPlugin Guidance

2009-11-30 Thread Vauthrin, Laurent
Thanks for the response but I'm still confused.  I don't see how a QParser will 
create multiple queries that need to be sent to shards sequentially.

Here's a more detailed example of what we're doing:

We're indexing documents in Solr that are somewhat equivalent to files.  We 
want users to be able to search by a file's directory.  We're shying away from 
the approach of storing the directory as an attribute because renaming a 
directory could mean re-indexing tens of thousands of file documents.  There 
are other file attributes that would have the same effect if they are modified.

So in an effort to avoid many large reindex jobs, we're trying to index both 
file documents and directory documents.  We don't want search users to have to 
deal with this implementation detail so we're looking to write a plugin that 
would do this for them.

e.g. For the following query that looks for a file in a directory:
q=+directory_name:"myDirectory" +file_name:"myFile"

We'd need to decompose the query into the following two queries:
1. q=+directory_name:"myDirectory"&fl=directory_id
2. q=+file_name:"myFile" +directory_id:(results from query #1)

I guess I'm looking for the following feedback:
- Does this sound crazy?  
- Is the QParser the right place for this logic?  If so, can I get a little 
more guidance on how to decompose the queries there (filter queries maybe)?

Thanks,
Laurent Vauthrin

-Original Message-
From: solr-user-return-29672-laurent.vauthrin=disney@lucene.apache.org 
[mailto:solr-user-return-29672-laurent.vauthrin=disney@lucene.apache.org] 
On Behalf Of Shalin Shekhar Mangar
Sent: Wednesday, November 25, 2009 5:42 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrPlugin Guidance

On Tue, Nov 24, 2009 at 11:04 PM, Vauthrin, Laurent <
laurent.vauth...@disney.com> wrote:

>
> Our team is trying to make a Solr plugin that needs to parse/decompose a
> given query into potentially multiple queries.  The idea is that we're
> trying to abstract a complex schema (with different document types) from
> the users so that their queries can be simpler.
>
>
>
> So basically, we're trying to do the following:
>
>
>
> 1.   Decompose query A into query B and query C
>
> 2.   Send query B to all shards and plug query B's results into
> query C
>
> 3.   Send Query C to all shards and pass the results back to the
> client
>
>
>
> I started trying to implement this by subclassing the SearchHandler but
> realized that I would not have access to HttpCommComponent.  Then I
> tried to replicate the SearchHandler class but realized that I might not
> have access to fields I would need in ShardResponse.  So I figured I
> should step back and get advice from the mailing list now J.  What is
> the best plugin point for decomposing a query into multiple queries so
> that all resultant queries can be sent to each shard?
>
>
>
All queries are sent to all shards? If yes, it sounds like a job for a
custom QParser.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Fulltext crawler

2009-11-30 Thread Smiley, David W.
And of course Heritrix   http://crawler.archive.org/
I think this one's quite cool.  You'll see example usage in my book.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

On Nov 26, 2009, at 5:01 AM, Shalin Shekhar Mangar wrote:

> On Thu, Nov 26, 2009 at 1:54 PM, Jörg Agatz wrote:
> 
>> *Hey guys*,I search a Fulltext crawler for Solr, to index HTML,OpenOffice
>> and Ms Office documets,PDF and muchmore formates.
>> How indexed you the Data?
>> 
>> Maby you can help me to find a Crawler.
>> 
> 
> If you need a web crawler, look at Nutch. Otherwise, you may need to build
> something using Driods or Aperture.
> 
> http://lucene.apache.org/nutch/
> http://incubator.apache.org/droids/
> http://aperture.sourceforge.net/
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.




Solr and Geoserver/Mapserver

2009-11-30 Thread gwk

Hello,

While my current implementation of searching on a map works, rendering 
hundreds of markers in an embedded Google map tends to slow browsers on 
slower computers (or fast computers running internet explorer :\) down 
to a crawl. I'm looking into generating tiles with the markers rendered 
on it on the server to improve performance (GTileLayerOverlay) Does 
anyone have any experience using geoserver, mapserver or a similar 
application in combination with Solr so that the application can 
generate tiles from a Solr query and tile position/zoom level?


Regards,

gwk


Re: search on tomcat server

2009-11-30 Thread Shalin Shekhar Mangar
On Mon, Nov 30, 2009 at 9:55 PM, Jill Han  wrote:

> I got solr running on the tomcat server,
> http://localhost:8080/solr/admin/
>
> After I enter a search word, such as, solr, then hit Search button, it
> will go to
>
> http://localhost:8080/solr/select/?q=solr&version=2.2&start=0&rows=10&in
> dent=on
>
>  and display
>
>   
>
> -
>  ndent=on>
>  <
>
> -
>  ndent=on>
>  <  
>
>  <0
>
>  <0
>
> -
>  ndent=on>
>  <
>
>  <  10
>
>  <  0
>
>  <  on
>
>  <  solr
>
>  <  2.2
>
> 
>
>   
>
>  <  
>
>  
>
>  My question is what is the next step to search files on tomcat server?
>
>
>
Looks like you have not added any documents to Solr. See the "Indexing
Documents" section at http://wiki.apache.org/solr/#Search_and_Indexing

-- 
Regards,
Shalin Shekhar Mangar.


search on tomcat server

2009-11-30 Thread Jill Han
I got solr running on the tomcat server,
http://localhost:8080/solr/admin/

After I enter a search word, such as, solr, then hit Search button, it
will go to 

http://localhost:8080/solr/select/?q=solr&version=2.2&start=0&rows=10&in
dent=on

 and display



-
  <

-
  <  

  <0 

  <0 

-
  <

  <  10 

  <  0 

  <  on 

  <  solr 

  <  2.2 

 

   

  <   

  

  My question is what is the next step to search files on tomcat server?

 

 

  Thank,

 

J  Jill

 

 



Re: Status of Spelt integration

2009-11-30 Thread Toby Cole

Hi Andrew,
	We ended up abandoning the spelt integration as the built in solr  
spellchecking improved so much during our project. Also, if you did go  
the route of using spelt, I'd implement it as a spellcheck plugin  
(which didn't exist as a concept when we started trying to shoehorn  
spelt into solr).

Regards, Toby.

On 30 Nov 2009, at 11:29, Andrey Klochkov wrote:


Hi all

I searched through the mail-list archives and saw that sometime ago  
Toby

Cole was going to integrate a spellchecker named Spelt into Solr. Does
anyone now what's the status of this? Anyone tried to use it with  
Solr? Does

it make sense to try it instead of standard spell checker?

Some links on the subject:
http://markmail.org/message/cqt4qtzzwyceltqu#query:+page:1+mid:cqt4qtzzwyceltqu+state:results
http://markmail.org/search/?q=spelt#query:spelt+page:1+mid:krzofzojhg7hmms7+state:results
http://groups.google.com/group/spelt

--
Andrew Klochkov
Senior Software Engineer,
Grid Dynamics


Solr plugin or something else for custom work?

2009-11-30 Thread javaxmlsoapdev

I have a requirement where I am indexing attachements. Attachements hang off
of a database entity(table). I also need to include some meta-data info from
the database table as part of the index. Trying to find best way to
implement using custom handler or something? where custom handler gets all
required db records (those include document path) by consuming a web service
(I can expose a method from my application as a web service) and then
itereate through a list (returned by web serivce) and index required meta
data along with indexing attachments (attachements path is part of meta data
of an entity). Has anyone tried something like this or have suggestions how
best to implement this requirement?
-- 
View this message in context: 
http://old.nabble.com/Solr-plugin-or-something-else-for-custom-work--tp26577014p26577014.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Word Concat 0 Results

2009-11-30 Thread AHMET ARSLAN
> I have a quick question for anyone with an idea how to
> solve this.  We have
> times when our users don¹t put spaces between words. 
> So for instance
> ³airmax² returns 0 results but ³air max² has at least
> 100 results.  Other
> than adding to the synonyms file every time, is there a
> more programmatic
> way we could possibly understand this scenario and return
> correct results.


Without manuel synonym table lookup, it would be very hard to recognize airmax 
at query time and split it into air max.

But at index time you can do it using modified version of ShingleFilterFactory. 
Simply it will concat all token n-grams.

Change the 
public static final String TOKEN_SEPARATOR = " ";
to 
public static final String TOKEN_SEPARATOR = "";
in org.apache.lucene.analysis.shingle.ShingleFilter

Also you need its Factory class to integrate it into solr.

The input document ( "but air max has" ) at index time will be tokenized into :

but => word
butair => shingle
air => word
airmax => shingle
max => word
maxhas => shingle
has => word

And the query airmax will match that document. But this solution increase your 
index size. It is better to write all possible words in to synonym.txt file 
manually. There is a similiar discussion suggests this in lucene-java-users 
group: 
http://old.nabble.com/splitting-words-to26573829.html#a26573829

Hope this helps.





RE: Child entity not getting index using DIH

2009-11-30 Thread Gupta, Saurabh
Shalin,

Thanks for your suggestion. A few questions though:

1. Are Mutli-valued fields essential to denormalizing the database in indexes?
2. What is the role of  tag in data-config xml file?
3. How does SolrJ or any other framework convert the returned results back into 
an object model?

Thanks,

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
Sent: Monday, November 30, 2009 12:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Child entity not getting index using DIH

On Mon, Nov 30, 2009 at 8:17 AM, Gupta, Saurabh wrote:

> Hi,
>
> I am using Solr to index two database tables with a very simple
> relationship.
>
> Table 1: PRODUCT has ID as PK
> Table 2: VERSION has ID as PK and PRODUCT_ID as PK
>
> There is a 1:m relationship between PRODUCT AND VERSION.
>
> When I run full-import, for some reason only 1 row in VERSION table is
> indexed per PRODUCT row. I am attaching the data-config here.
>
>
A Solr document is flat - it does not have any nested rows. Your
productVersion* fields must be multi-valued otherwise the last row from 
product_version table will overwrite the previous rows.

--
Regards,
Shalin Shekhar Mangar.


Re: Batch file upload using solrJ API

2009-11-30 Thread javaxmlsoapdev

Any suggestion/pointers on this?

javaxmlsoapdev wrote:
> 
> Is there an API to upload files over one connection versus looping through
> all the files and creating new ContentStreamUpdateRequest for each file.
> This, as expected, doesn't work if there are large number of files and
> quickly run into memory problems. Please advise.
> 
> Thanks,
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Batch-file-upload-using-solrJ-API-tp26518167p26576268.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Where to put ExternalRequestHandler and Tika jars

2009-11-30 Thread javaxmlsoapdev

Yes. code I posted in first thread does work. And I am able to retrieve data
from the document index. did you include all required jars in deployed solr
application's lib folder? what errors are you seeing?

Juan Pedro Danculovic wrote:
> 
> HI! does your example finally works? I index the data with solrj and I
> have
> the same problem and could not retrieve file data.
> 
> 
> On Wed, Nov 25, 2009 at 3:41 PM, javaxmlsoapdev  wrote:
> 
>>
>> g. I had to include tika and related parsing jars into
>> tomcat/webapps/solr/WEB-INF/lib.. this was an embarrassing mistake.
>> apologies for all the noise.
>>
>> Thanks,
>> --
>> View this message in context:
>> http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26518100.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26576242.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Maximum number of fields allowed in a Solr document

2009-11-30 Thread Alex Wang
Thanks Otis for the reply. Yes this will be pretty memory intensive.  
The size of the index is 5 cores with a maximum of 500K documents each  
core. I did search the archives before but did not find any definite  
answer. Thanks again!

Alex



On Nov 27, 2009, at 11:09 PM, Otis Gospodnetic wrote:

> Hi Alex,
>
> There is no build-in limit.  The limit is going to be dictated by  
> your hardware resources.  In particular, this sounds like a memory  
> intensive app because of sorting on lots of different fields.  You  
> didn't mention the size of your index, but that's a factor, too.   
> Once in a while people on the list mention cases with lots and lots  
> of fields, so I'd check ML archives.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
>> From: Alex Wang 
>> To: "solr-user@lucene.apache.org" 
>> Sent: Thu, November 26, 2009 12:47:36 PM
>> Subject: Maximum number of fields allowed in a Solr document
>>
>> Hi,
>>
>> We are in the process of designing a Solr app where we might have
>> millions of documents and within each of the document, we might have
>> thousands of dynamic fields. These fields are small and only contain
>> an integer, which needs to be retrievable and sortable.
>>
>> My questions is:
>>
>> 1. Is there a limit on the number of fields allowed per document?
>> 2. What is the performance impact for such design?
>> 3. Has anyone done this before and is it a wise thing to do?
>>
>> Thanks,
>>
>> Alex
>



Word Concat 0 Results

2009-11-30 Thread Jeff Newburn
All,

I have a quick question for anyone with an idea how to solve this.  We have
times when our users don¹t put spaces between words.  So for instance
³airmax² returns 0 results but ³air max² has at least 100 results.  Other
than adding to the synonyms file every time, is there a more programmatic
way we could possibly understand this scenario and return correct results.

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com



Re: Phrase Query Issue

2009-11-30 Thread Erick Erickson
You didn't say what field you're searching on, but if it's any
field with one of the stemmers involved, then this is expected
behavior. That is, "test" and "tests" are both stemmed to
"test". I do see SnowballPorterFilterFactory in your schema,
so this is what I'd check first

Think carefully about removing this though. Depending on the
intent of the field users may well *want* this behavior.

Best
Erick

On Mon, Nov 30, 2009 at 6:02 AM, ravicv  wrote:

>
> Hi I have a problem with phrase search.
>
> If I search test then tests is also coming in search result.
>
> Please help me ...
>
> My schema continues...
> "
>  
>  < type="index">
>
>
>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="0"
> preserveOriginal="1"/>
>
> protected="protwords.txt"/>
>
>
>  < a nalyzer type="query">
>
> ignoreCase="true" expand="false"/>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="0"
> preserveOriginal="1"/>
>
> protected="protwords.txt"/>
>
>  
>
>
>
> positionIncrementGap="100">
>  
>
>
>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="0"
> preserveOriginal="1"/>
>
>
>
>  
>  
>
> ignoreCase="true" expand="false"/>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="0"
> preserveOriginal="1"/>
>
>
>
>  
>"
>
> and copy field
>
> 
> 
>
> Please help me ...
>
>
>
>
> Otis Gospodnetic wrote:
> >
> >
> > Let me second this.  People ask for this pretty often.
> >
> >  Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> >> From: Erik Hatcher 
> >> To: solr-user@lucene.apache.org
> >> Sent: Saturday, April 4, 2009 8:33:46 PM
> >> Subject: Re: Phrase Query Issue
> >>
> >>
> >> On Apr 4, 2009, at 1:25 AM, dabboo wrote:
> >>
> >> >
> >> > Erik,
> >> >
> >> > Thanks a lot for your reply. I have made some changes in the solr code
> >> and
> >> > now field clauses are working fine with dismax request. Not only this,
> >> > wildcard characters are also working with dismax and q query
> parameter.
> >> >
> >> > If you want I can share modified code with you.
> >>
> >> That'd be good to share.  Simply open a Solr JIRA issue with this
> >> enhancement
> >> request and post your code there.  Test cases and documentation always
> >> appreciated too, but working code to start with is fine.
> >>
> >> Erik
> >
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Phrase-Query-Issue-tp22863529p26572797.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


RE: configure solr

2009-11-30 Thread Jill Han
I got it running. -Dsolr.solr.home=c:\web\solr is needed.

Thanks all for the help,

Jill

-Original Message-
From: dipti khullar [mailto:dipti.khul...@gmail.com] 
Sent: Thursday, November 26, 2009 11:01 AM
To: solr-user@lucene.apache.org
Subject: Re: configure solr
X-HOSTLOC: alverno.edu/10.0.60.10

Hi

1. Issue with jetty:
When you start the jetty server by running start.jar, just look at the
logs
to verify whether jetty has started successfully or not. At times, the
port
you are using to start jetty(in your case 8983) could be used by some
other
apps, which can cause issues in start up.

2. Details are provided step by step in the wiki as mentioned by eric:
http://wiki.apache.org/solr/SolrTomcat

But I hope following points can be of some help in debugging issues with
Tomcat installation:

:: Configure Tomcat to recognise the solr home directory you created, by
adding the Java Option -Dsolr.solr.home=c:\web\solr
:: Also, you can easily debug the root cause of problem by looking into
catalina.out file under logs folder of your tomcat installation.

Thanks
Dipti


On Wed, Nov 25, 2009 at 3:25 AM, Jill Han  wrote:

> Hi,
>
> I just downloaded solr -1.4.0 to my computer, C:\apache-solr-1.4.0.
>
> 1.I followed the instruction to run the sample, java -jar
> start.jar at C:\apache-solr-1.4.0\example
>
> And then go to http://localhost:8983/solr/admin, however, I got
>
>
> HTTP ERROR: 404
>
>NOT_FOUND
>
> RequestURI=/solr/admin
>
> Powered by jetty:// 
>
> Did I miss something?
>
> 2.   Since I can't get sample run, I tried to run it on tomcat
> server(5.5) directly as
>
> a.   Copy/paste apache-solr-1.4.0.war to C:\Tomcat 5.5\webapps,
>
> b.   Go to http://localhost:8080/apache-solr-1.4.0/
>
> The error message is" HTTP Status 500 - Severe errors in solr
> configuration.."
>
> 3.   How to configure it on tomcat server?
>
> Your help is appreciated very much as always,
>
> Jill
>
>
>
>
>
>


Re: Trouble Configuring WordDelimiterFilterFactory

2009-11-30 Thread Erick Erickson
I think the problem here is that underlying WordDelimiterFactory
is StandardTokenizer, at least that's what I infer from here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

I
think you want to use a different tokenizer, because StandardTokenizer
may be stripping the decimal from .355. But that's just a guess. You'll get
more info if you examine your index and see what's *really* indexed in
these fields

Best
Erick

On Sun, Nov 29, 2009 at 10:31 AM, Rahul R  wrote:

> Steve,
> My settings for both index and query are :
>  generateNumberParts="0" catenateWords="1" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> preserveOriginal="1"/>
>
> Let me give an example. Suppose I have the following 2 documents:
> Document 1(Text Field): Bridge-Diode .355 Volts
> Document 2(Text Field): Bridge-Diode 355 Volts
>
> Requirement : Search for ".355" should retrieve only document 1 (Not
> happening now)
> Requirement: Search for "Bridge" should retrieve both documents (Works as
> expected)
>
> The reason why a search for ".355" is retrieving both documents is that
> term
> texts for .355 in the document are created as .355 and 355. Even if I set
> generateWordParts and catenateWords to "0", the way term texts are created
> for ".355" does not change.
>
> Thank you for your time.
>
> Regards
> Rahul
>
> On Sun, Nov 29, 2009 at 1:07 AM, Steven A Rowe  wrote:
>
> > Hi Rahul,
> >
> > On 11/26/2009 at 12:53 AM, Rahul R wrote:
> > > Is there a way by which I can prevent the WordDelimiterFilterFactory
> > > from totally acting on numerical data ?
> >
> > "prevent ... from totally acting on" is pretty vague, and nowhere AFAICT
> do
> > you say precisely what it is you want.
> >
> > It would help if you could give example text and the terms you think
> should
> > be the result of analysis of the text.  If you want different index/query
> > time behavior, please provide this info for both.
> >
> > Steve
> >
> >
>


Question regarding scoring/boosting

2009-11-30 Thread Oliver Beattie
Hey everyone,

I'm what one would probably call a beginner with Solr. I have my data loaded
in and I am getting the hang of querying things. However, I'm still rather
unclear as to how the score can be affected by various parameters. I'm using
the dismax request handler, and I just don't quite get how doing foo^value
in the bf affects the score. Perhaps if someone could explain this at a
basic level or point me in the direction of some documentation as to how
this affects the final score this would be very helpful.

Thanks,
Oliver


RE: nested solr queries

2009-11-30 Thread Smiley, David W.
Shameless plug here if you're having trouble grasping the schema and 
differences from relational databases then I think you'll find my book helpful 
(chapter 2).
https://www.packtpub.com/solr-1-4-enterprise-search-server/book
~ David Smiley


From: Mark N [nipen.m...@gmail.com]
Sent: Monday, November 30, 2009 4:36 AM
To: solr-user@lucene.apache.org
Subject: Re: nested solr queries

thanks for your help so do you think I should execute solr queries twice ?
or is there any other workarounds




On Mon, Nov 30, 2009 at 3:07 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Nov 30, 2009 at 2:26 PM, Mark N  wrote:
>
> > field2="xyz" we dont know until we run query1
> >
> >
> Ah, ok. I thought xyz was a literal that you wanted to search.
>
>
> > To simply i was actually trying to do some kind of JOIN similar to
> > following
> > SQL query
> >
> >
> >  select  * from table1  where  *field2*  in
> >  ( select *field2  *from dbo.concept_db where field1='ABC' )
> >
> > if this is not possible then i will have to search inner query  (
> > select *field2
> > *from dbo.concept_db where field1='ABC' )  first and then only  run the
> > outer query
> >
> >
> No, there are no joins in Solr. Consider de-normalizing your schema, if you
> haven't.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



--
Nipen Mark

Status of Spelt integration

2009-11-30 Thread Andrey Klochkov
Hi all

I searched through the mail-list archives and saw that sometime ago Toby
Cole was going to integrate a spellchecker named Spelt into Solr. Does
anyone now what's the status of this? Anyone tried to use it with Solr? Does
it make sense to try it instead of standard spell checker?

Some links on the subject:
http://markmail.org/message/cqt4qtzzwyceltqu#query:+page:1+mid:cqt4qtzzwyceltqu+state:results
http://markmail.org/search/?q=spelt#query:spelt+page:1+mid:krzofzojhg7hmms7+state:results
http://groups.google.com/group/spelt

-- 
Andrew Klochkov
Senior Software Engineer,
Grid Dynamics


Re: Phrase Query Issue

2009-11-30 Thread ravicv

Hi I have a problem with phrase search.

If I search test then tests is also coming in search result.

Please help me ...

My schema continues...
"
 
  < type="index">








 
  < a nalyzer type="query">







  




  








  
  







  
"

and copy field 

 
 

Please help me ...




Otis Gospodnetic wrote:
> 
> 
> Let me second this.  People ask for this pretty often.
> 
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: Erik Hatcher 
>> To: solr-user@lucene.apache.org
>> Sent: Saturday, April 4, 2009 8:33:46 PM
>> Subject: Re: Phrase Query Issue
>> 
>> 
>> On Apr 4, 2009, at 1:25 AM, dabboo wrote:
>> 
>> > 
>> > Erik,
>> > 
>> > Thanks a lot for your reply. I have made some changes in the solr code
>> and
>> > now field clauses are working fine with dismax request. Not only this,
>> > wildcard characters are also working with dismax and q query parameter.
>> > 
>> > If you want I can share modified code with you.
>> 
>> That'd be good to share.  Simply open a Solr JIRA issue with this
>> enhancement 
>> request and post your code there.  Test cases and documentation always 
>> appreciated too, but working code to start with is fine.
>> 
>> Erik
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Phrase-Query-Issue-tp22863529p26572797.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: nested solr queries

2009-11-30 Thread Shalin Shekhar Mangar
I don't know the use-case so I cannot suggest anything. Solr is different
from relational databases and techniques which are taken for granted in the
RDBMS world are usually not required or have bad performance
characteristics. You shouldn't try to solve problems the same way in solr
and databases.

You have told us that you need to do joins but you haven't told us *why* you
need to do joins. There maybe other ways of solving the same problem. If
not, two queries may be the only way to go.

On Mon, Nov 30, 2009 at 3:06 PM, Mark N  wrote:

> thanks for your help so do you think I should execute solr queries twice ?
> or is there any other workarounds
>
>
>
>
> On Mon, Nov 30, 2009 at 3:07 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > On Mon, Nov 30, 2009 at 2:26 PM, Mark N  wrote:
> >
> > > field2="xyz" we dont know until we run query1
> > >
> > >
> > Ah, ok. I thought xyz was a literal that you wanted to search.
> >
> >
> > > To simply i was actually trying to do some kind of JOIN similar to
> > > following
> > > SQL query
> > >
> > >
> > >  select  * from table1  where  *field2*  in
> > >  ( select *field2  *from dbo.concept_db where field1='ABC' )
> > >
> > > if this is not possible then i will have to search inner query  (
> > > select *field2
> > > *from dbo.concept_db where field1='ABC' )  first and then only  run the
> > > outer query
> > >
> > >
> > No, there are no joins in Solr. Consider de-normalizing your schema, if
> you
> > haven't.
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>
>
>
> --
> Nipen Mark
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: nested solr queries

2009-11-30 Thread Mark N
thanks for your help so do you think I should execute solr queries twice ?
or is there any other workarounds




On Mon, Nov 30, 2009 at 3:07 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Nov 30, 2009 at 2:26 PM, Mark N  wrote:
>
> > field2="xyz" we dont know until we run query1
> >
> >
> Ah, ok. I thought xyz was a literal that you wanted to search.
>
>
> > To simply i was actually trying to do some kind of JOIN similar to
> > following
> > SQL query
> >
> >
> >  select  * from table1  where  *field2*  in
> >  ( select *field2  *from dbo.concept_db where field1='ABC' )
> >
> > if this is not possible then i will have to search inner query  (
> > select *field2
> > *from dbo.concept_db where field1='ABC' )  first and then only  run the
> > outer query
> >
> >
> No, there are no joins in Solr. Consider de-normalizing your schema, if you
> haven't.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Nipen Mark


Re: nested solr queries

2009-11-30 Thread Shalin Shekhar Mangar
On Mon, Nov 30, 2009 at 2:26 PM, Mark N  wrote:

> field2="xyz" we dont know until we run query1
>
>
Ah, ok. I thought xyz was a literal that you wanted to search.


> To simply i was actually trying to do some kind of JOIN similar to
> following
> SQL query
>
>
>  select  * from table1  where  *field2*  in
>  ( select *field2  *from dbo.concept_db where field1='ABC' )
>
> if this is not possible then i will have to search inner query  (
> select *field2
> *from dbo.concept_db where field1='ABC' )  first and then only  run the
> outer query
>
>
No, there are no joins in Solr. Consider de-normalizing your schema, if you
haven't.

-- 
Regards,
Shalin Shekhar Mangar.


Re: nested solr queries

2009-11-30 Thread Mark N
field2="xyz" we dont know until we run query1

To simply i was actually trying to do some kind of JOIN similar to following
SQL query


 select  * from table1  where  *field2*  in
 ( select *field2  *from dbo.concept_db where field1='ABC' )

if this is not possible then i will have to search inner query  (
select *field2
*from dbo.concept_db where field1='ABC' )  first and then only  run the
outer query

thanks
chandan




On Mon, Nov 30, 2009 at 2:25 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Nov 30, 2009 at 2:02 PM, Mark N  wrote:
>
> > hi shalin
> >
> > I am trying to achieve something like JOIN. Previously am doing this with
> > two queries on solr
> >
> > solr index  = ( field1 ,field 2, field3)
> >
> > query1 = (  for  example field1="ABC" )
> >
> > suppose query1 returns results set1= { 1, 2 ,3 ,4 } which matches query1
> >
> > query2 = (   get all records having field2="xyz" for each records  i.e
>  for
> > set1= {1,2,3,4} returned by query1 )
> >
> >
> That sequence of queries will return documents which have field1="ABC" and
> field2="xyz". The same result can be obtained in one query with
> q=+field1:"ABC" +field2:"xyz"
>
> Have I misunderstood the problem?
>
>
> > Am not sure if I could do something like this using the nested solr query
> > from link
> >
> > http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/
> >
> >
> No, nested queries can only influence scores. They do not filter the
> results.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: nested solr queries

2009-11-30 Thread Shalin Shekhar Mangar
On Mon, Nov 30, 2009 at 2:02 PM, Mark N  wrote:

> hi shalin
>
> I am trying to achieve something like JOIN. Previously am doing this with
> two queries on solr
>
> solr index  = ( field1 ,field 2, field3)
>
> query1 = (  for  example field1="ABC" )
>
> suppose query1 returns results set1= { 1, 2 ,3 ,4 } which matches query1
>
> query2 = (   get all records having field2="xyz" for each records  i.e  for
> set1= {1,2,3,4} returned by query1 )
>
>
That sequence of queries will return documents which have field1="ABC" and
field2="xyz". The same result can be obtained in one query with
q=+field1:"ABC" +field2:"xyz"

Have I misunderstood the problem?


> Am not sure if I could do something like this using the nested solr query
> from link
>
> http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/
>
>
No, nested queries can only influence scores. They do not filter the
results.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Intensive querying give odd results in search

2009-11-30 Thread Shalin Shekhar Mangar
On Thu, Nov 26, 2009 at 9:14 PM, jmsm  wrote:

>
> Hi, All.
>
> I have a problem regarding intensive query requesting.
> I'm using SolrJ client through http in the client side and Solr 1.4 and
> tomcat 6.0.20 on the server side.
>
> My purpose is to execute 3 different queries for each word in a list of
> words and get the number of results.
>
> On the client
> **
> String queryText = "titexpl:NASA";
>
> SolrServer server = ServerConSearch.getInstance().getServer();
> SolrQuery query = new SolrQuery();
> query.setQuery(queryText);
> query.addField("*");
> query.addField("score");
> query.setRows(0);
> QueryResponse rsp = server.query(query);
> final long resultsNum = rsp.getResults().getNumFound();
>
>
> My problem is that I get different results for the same query. For example,
> if I search "titexpl:NASA" I get 10 results for the first time but after
> some tries I get 0 results. After more tries, I get 10 results again an so
> on.
>
> I tried with Tomcat running on Windows Vista and Ubuntu (in VMWare) but
> always got the same problem. Also used the default Jetty that cames with
> Solr, but no luck.
>
>
> Any thoughts?
>
> Ze Marques
>
>
> PS: No index update is being done during the queries.
> PS 2: Adding a sleep of 50ms for each query request, seems to solve the
> problem.
>
>
That is very strange. Is it possible that errors during search are getting
counted as 0 results in your program?

If you can isolate the problem into a repeatable test case, we can try to
figure out what is wrong. I haven't yet seen this issue elsewhere.

-- 
Regards,
Shalin Shekhar Mangar.


Re: nested solr queries

2009-11-30 Thread Mark N
hi shalin

I am trying to achieve something like JOIN. Previously am doing this with
two queries on solr

solr index  = ( field1 ,field 2, field3)

query1 = (  for  example field1="ABC" )

suppose query1 returns results set1= { 1, 2 ,3 ,4 } which matches query1

query2 = (   get all records having field2="xyz" for each records  i.e  for
set1= {1,2,3,4} returned by query1 )

Am not sure if I could do something like this using the nested solr query
from link

http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/



thanks


On Mon, Nov 30, 2009 at 1:50 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Nov 30, 2009 at 1:19 PM, Mark N  wrote:
>
> > Is it possible to write nested queries in Solr similar to sql like query
> > where  I can take results of the first query and use one or more of its
> > fields as an argument in the second query.
> >
> >
> That sounds like a join. If so, the answer would be no.
>
>
> >
> > For example:
> >
> > field1:XYZ AND (_query_: field3:{value of field4})
> >
> > This should search for all types of XYZ and then iterate over the result
> > set
> > and perform a query for where field3  is equal to the value of field1
> from
> > each item of the first result set.
> >
> >
> Your description is not consistent with the query you have given. If
> field:XYZ is specified, then what are "types" of XYZ? Also, if you want to
> perform a query where field3 is equal to the value of field1 then, what is
> field4 in the query you have given?
>
>
> > this is similar to SQL like query
> >
> >
> > select distinct ( fieldA ) from table where fieldA  IN
> >
>
> That sounds similar to faceting. See
> http://wiki.apache.org/solr/SimpleFacetParameters
>
> Perhaps you can give more details on what you want to achieve.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Wildcard searches within phrases to use proximity

2009-11-30 Thread Shalin Shekhar Mangar
On Fri, Nov 27, 2009 at 12:33 AM, AHMET ARSLAN  wrote:

> > That'd be great. Please open an issue in Jira and attach a
> > patch. See
> > http://wiki.apache.org/solr/HowToContribute
> >
>
> Hi Shalin,
> I opened an issue (SOLR-1604) and attached a patch as well as a maven
> project to enable this feature without applying the patch. I couldn't
> consume ComplexPhraseQueryParser from lucene-misc-2.9.1.jar. Because there
> is a fixed bug but it is not included in lucene release. LUCENE-1486 says
> guidance needed from the Solr team about preferred course of action.
>
> I will add more test cases to the patch in the future.
>
>
Thanks Ahmet, I've marked the issue for 1.5 so we do not forget about it.
I'll take a look at the patch soon.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Multi index

2009-11-30 Thread Shalin Shekhar Mangar
On Sat, Nov 28, 2009 at 3:12 PM, Jörg Agatz wrote:

> Hallo Users...
>
> At the Moment i test MultiCorae Solr, but i cant search in more than one
> core direktly..
>
> Exist a way to use multiindex, 3-5 Indizes in one core ans search direkty
> in
> all? ore only in one?
>
>
>
You can search on all cores if schema.xml is same. See
http://wiki.apache.org/solr/DistributedSearch

If schema.xml is different, you can search on one core only. You can
denormalize and combine cores if you want to search on all of them.


-- 
Regards,
Shalin Shekhar Mangar.


Re: nested solr queries

2009-11-30 Thread Shalin Shekhar Mangar
On Mon, Nov 30, 2009 at 1:19 PM, Mark N  wrote:

> Is it possible to write nested queries in Solr similar to sql like query
> where  I can take results of the first query and use one or more of its
> fields as an argument in the second query.
>
>
That sounds like a join. If so, the answer would be no.


>
> For example:
>
> field1:XYZ AND (_query_: field3:{value of field4})
>
> This should search for all types of XYZ and then iterate over the result
> set
> and perform a query for where field3  is equal to the value of field1 from
> each item of the first result set.
>
>
Your description is not consistent with the query you have given. If
field:XYZ is specified, then what are "types" of XYZ? Also, if you want to
perform a query where field3 is equal to the value of field1 then, what is
field4 in the query you have given?


> this is similar to SQL like query
>
>
> select distinct ( fieldA ) from table where fieldA  IN
>

That sounds similar to faceting. See
http://wiki.apache.org/solr/SimpleFacetParameters

Perhaps you can give more details on what you want to achieve.

-- 
Regards,
Shalin Shekhar Mangar.