date:20110628

Hi,

I found solr-ruby gem (http://wiki.apache.org/solr/solr-ruby) really
inflexible in terms of specifying handler. The Solr::Request::Select class
defines handler as select and all other classes inherit from this class.
And since the methods in Solr::Connection use one of the classes from
Solr::Request, I don't see a direct way to use a custom handler (which I
have made for MoreLikeThis). Currently, the approach I am using is to create
the query URL, do a CURL, parse the response and return it.

Even if I'd to extend the classes, I'd end up making a new
Solr::Request::CustomSelect which will be similar to Solr::Request::Select
except for the flexibility for the user to provide handler, defaulted by
'select'. Then creating different classes each for DisMax and all, which
will be derived from Solr::Request::CustomSelect. Isn't this too much of an
overhead? Or am I missing something?

Also, where can I file bugs to solr-ruby?


*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

Include synonys in solr

Hi, i am using solr for my searches. in this i found a synonyms.text file in
which you can include synonyms manually for the words u want.

But as i suppose it would be very hard to include synonyms manually for each
word as my application has large data.

I want to know is there any way that this synonym.text file generate
automatically referring to all dictionary words

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Include-synonys-in-solr-tp3116836p3116836.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Include synonys in solr

2011-06-28 Thread Gora Mohanty

On Tue, Jun 28, 2011 at 12:54 PM, Romi romijain3...@gmail.com wrote:
 Hi, i am using solr for my searches. in this i found a synonyms.text file in
 which you can include synonyms manually for the words u want.

Please see 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

No offence, but a simple Google search, or a search of the Wiki
would have turned this up. Please try such simpler avenues before
dashing off a message to the list.

Regards,
Gora

Re: Include synonys in solr

2011-06-28 Thread Michael Kuhlmann

Am 28.06.2011 09:24, schrieb Romi:
 But as i suppose it would be very hard to include synonyms manually for each
 word as my application has large data.
 
 I want to know is there any way that this synonym.text file generate
 automatically referring to all dictionary words

I don't get the point here. Why should you want to add all dictionary
words to the synonyms? To what shall they translate? Just having all
words in synonyms.txt doesn't make much sense.

If you're asking about some kind of translation into another language:
In that case, you'd rather translate the text at index time and put it
into another field which you query as well.

In my last project, we had multi-valued fields like meta_description
and misspelled, where you could add arbitrary synonyms for each
document - maybe that's what you're asking for?

-Kuli

Re: Analyzer creates PhraseQuery

2011-06-28 Thread lboutros

You could add this filter after the NGram filter to prevent the phrase query
creation :

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Analyzer-creates-PhraseQuery-tp3116288p3116885.html
Sent from the Solr - User mailing list archive at Nabble.com.

Find results with or without whitespace

2011-06-28 Thread Frankie

I'm looking for a way to index/search on terms that may or may not contain
spaces.
An example will explain better :
- Loooking for healthcare, I want to find both healthcare and health
care.
- Loooking for health care, I want to find both health care and
healthcare.

My other constraints are
- I will index rather long strings (extracted from Office documents)
- I want to avoid synonym lists (as they may be incomplete)
- I want to avoid specific logic (i.e. query rewriting with as many OR as
search terms combination requires)
- I don't want to rely on uppercase/lowercase tokenizer (as users are...
creative)

I already tried many tokenizer/filter combination without success.
I did not find any answer to this problem.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Find-results-with-or-without-whitespace-tp3117144p3117144.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple spatial values

2011-06-28 Thread marthinal


Yonik Seeley-2-2 wrote:
 
 On Sat, Jun 25, 2011 at 5:56 AM, marthinal
 lt;jm.rodriguez.ve...@gmail.comgt; wrote:
 sfield, pt and d can all be specified directly in the spatial
 functions/filters too, and that will override the global params.

 Unfortunately one must currently use lucene query syntax to do an OR.
 It just makes it look a bit messier.

 q=_query_:{!geofilt} _query:{!geofilt sfield=location_2}

 -Yonik
 http://www.lucidimagination.com


 @Yonik it seems to work like this, i triyed houndreds of other
 possibilities
 without success:

 q={!geofilt sfield=location_1 pt=36.62,-6.23 d=50}fq={!geofilt
 sfield=location_2 pt=40.51,-5.91 d=500}
 
 Ah, right.  I had thought you wanted docs that matched either geofilt
 (hence OR), not docs that only matched both.
 
 -Yonik
 http://www.lucidimagination.com
 

Yes Yonik what i do now is

q={!geofilt sfield=location_1 pt=36.62,-6.23 d=50}fq=_query_:{!geofilt
sfield=location_2 pt=40.51,-5.91 d=500} other_filter:value ..

I write here the query because maybe it *helps* to someone that need to do
something like this ... 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiple-spatial-values-tp1555668p3117145.html
Sent from the Solr - User mailing list archive at Nabble.com.

Saravanan Chinnadurai/Actionimages is out of the office.

2011-06-28 Thread Saravanan . Chinnadurai

I will be out of the office starting  28/06/2011 and will not return until
30/06/2011.

Please email to itsta...@actionimages.com  for any urgent issues.


Action Images is a division of Reuters Limited and your data will therefore be 
protected
in accordance with the Reuters Group Privacy / Data Protection notice which is 
available
in the privacy footer at www.reuters.com
Registered in England No. 145516   VAT REG: 397000555

Index Version and Epoch Time?

Hi,

I am not sure what is the index number value? It looks like an epoch time,
but in my case, this points to one month back. However, i can see documents
which were added last week, to be in the index.

Even after I did a commit, the index number did not change? Isn't it
supposed to change on every commit? If not, is there a way to look into the
last index time?

Also, this page
http://wiki.apache.org/solr/SolrReplication#Replication_Dashboard shows a
Replication Dashboard. How is this dashboard invoked? Is there any URL which
needs to be called?


*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

Re: Include synonys in solr

Please see
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

No offence, but a simple Google search, or a search of the Wiki
would have turned this up. Please try such simpler avenues before
dashing off a message to the list.


Gora, I heve already read the document and also included synonyms in my
search results :)

My question is , when i use this *filter class=solr.SynonymFilterFactory
synonyms=syn.txt ignoreCase=true expand=false/
* i need to enter synonyms manually in synonyms.txt. which is really tough
if you have many words for synonyms. i wanted to ask is there any other
option so that i need not to enter synonyms manually.. i hope you got my
point :)
 

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Include-synonyms-in-solr-tp3116836p3117365.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Include synonys in solr

I don't want to add all dictionary words to my synonyms.txt, but i wanted to
include synonyms for the words which i am having in my data...as you can
imagine if i have suppose 1000 words then i would be very tough to enter
synonyms for these 1000 words in synonyms.txt manually. I just want to know
how can i solve this puzzle so that i need not to enter synonyms manually.

for example for GB i am entering gigabyte
for ring i am entering synonyms as band, circle


-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Include-synonyms-in-solr-tp3116836p3117373.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Include synonys in solr

Well you need to find word lists and/or a thesaurus.

This is one place to start:

http://wordlist.sourceforge.net/

I used the US/UK english word list for my synonyms for an index I have because 
it contains both US and UK english terms, the list lacks some medical terms 
though so we just added them.

Cheers

François

On Jun 28, 2011, at 6:55 AM, Romi wrote:

 Please see
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 
 No offence, but a simple Google search, or a search of the Wiki
 would have turned this up. Please try such simpler avenues before
 dashing off a message to the list.
 
 
 Gora, I heve already read the document and also included synonyms in my
 search results :)
 
 My question is , when i use this *filter class=solr.SynonymFilterFactory
 synonyms=syn.txt ignoreCase=true expand=false/
 * i need to enter synonyms manually in synonyms.txt. which is really tough
 if you have many words for synonyms. i wanted to ask is there any other
 option so that i need not to enter synonyms manually.. i hope you got my
 point :)
 
 
 -
 Thanks  Regards
 Romi
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Include-synonyms-in-solr-tp3116836p3117365.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Find results with or without whitespace

2011-06-28 Thread roySolr

I had the same problem:

http://lucene.472066.n3.nabble.com/Results-with-and-without-whitespace-soccer-club-and-soccerclub-td2934742.html#a2964942



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Find-results-with-or-without-whitespace-tp3117144p3117386.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq

I also have the problem of duplicate docs.
I am indexing news articles, Every news article will have the source URL,
If two news-article has the same URL, only one need to index,
removal of duplicate at index time.

On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:

have you checked out the deduplication process that's available at
indexing time ? This includes a fuzzy hash algorithm .

http://wiki.apache.org/solr/Deduplication

-Simon

On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com wrote:
This approach would definitely work is the two documents are *Exactly*
the
same. But this is very fragile. Even if one extra space has been added,
the
whole hash would change. What I am really looking for is some %age
similarity between documents, and remove those documents which are more
than
95% similar.

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:

What you need to do, is to calculate some HASH (using any message digest
algorithm you want, md5, sha-1 and so on), then do some reading on solr
field collapse capabilities. Should not be too complicated..

*Omri Cohen*

Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
+972-3-6036295

My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
[image:
Twitter] http://www.twitter.com/omricohe [image:
WordPress]http://omricohen.me
Please consider your environmental responsibility. Before printing this
e-mail message, ask yourself whether you really need a hard copy.
IMPORTANT: The contents of this email and any attachments are
confidential.
They are intended for the named recipient(s) only. If you have received
this
email by mistake, please notify the sender immediately and do not
disclose
the contents to anyone or make copies thereof.
Signature powered by

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

WiseStamp

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

-- Forwarded message --
From: Pranav Prakash pra...@gmail.com
Date: Thu, Jun 23, 2011 at 12:26 PM
Subject: Removing duplicate documents from search results
To: solr-user@lucene.apache.org

How can I remove very similar documents from search results?

My scenario is that there are documents in the index which are almost
similar (people submitting same stuff multiple times, sometimes
different
people submitting same stuff). Now when a search is performed for
keyword,
in the top N results, quite frequently, same document comes up multiple
times. I want to remove those duplicate (or possible duplicate)
documents.
Very similar to what Google does when they say In order to show you
most
relevant result, duplicates have been removed. How can I achieve this
functionality using Solr? Does Solr has an implied or plugin which could
help me with it?

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com

|
Google http://www.google.com/profiles/pranny

--
Thanks and Regards
Mohammad Shariq

Re: Removing duplicate documents from search results

Create a hash from the url and use that as the unique key, md5 or sha1 would
probably be good enough.

Cheers

François

On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:

On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:

have you checked out the deduplication process that's available at
indexing time ? This includes a fuzzy hash algorithm .

http://wiki.apache.org/solr/Deduplication

-Simon

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:

*Omri Cohen*

Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
+972-3-6036295

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

WiseStamp

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

-- Forwarded message --
From: Pranav Prakash pra...@gmail.com
Date: Thu, Jun 23, 2011 at 12:26 PM
Subject: Removing duplicate documents from search results
To: solr-user@lucene.apache.org

How can I remove very similar documents from search results?

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com

|
Google http://www.google.com/profiles/pranny

--
Thanks and Regards
Mohammad Shariq

Re: multiple spatial values

2011-06-28 Thread Darren Govoni

Will it be possible to do spatial searches on multi-valued spatial 
fields soon?


I have a latlon field (point) that is multi-valued and don't know how to 
search against it

such that the lats and lons match correctly - since they are split apart.

e.g. I have a document with 10 point/latlon values for the same field.

On 06/28/2011 05:15 AM, marthinal wrote:

Yonik Seeley-2-2 wrote:

On Sat, Jun 25, 2011 at 5:56 AM, marthinal
lt;jm.rodriguez.ve...@gmail.comgt; wrote:

sfield, pt and d can all be specified directly in the spatial
functions/filters too, and that will override the global params.

Unfortunately one must currently use lucene query syntax to do an OR.
It just makes it look a bit messier.

q=_query_:{!geofilt} _query:{!geofilt sfield=location_2}

-Yonik
http://www.lucidimagination.com


@Yonik it seems to work like this, i triyed houndreds of other
possibilities
without success:

q={!geofilt sfield=location_1 pt=36.62,-6.23 d=50}fq={!geofilt
sfield=location_2 pt=40.51,-5.91 d=500}

Ah, right.  I had thought you wanted docs that matched either geofilt
(hence OR), not docs that only matched both.

-Yonik
http://www.lucidimagination.com


Yes Yonik what i do now is

q={!geofilt sfield=location_1 pt=36.62,-6.23 d=50}fq=_query_:{!geofilt
sfield=location_2 pt=40.51,-5.91 d=500} other_filter:value ..

I write here the query because maybe it *helps* to someone that need to do
something like this ...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiple-spatial-values-tp1555668p3117145.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq

I am making the Hash from URL, but I can't use this as UniqueKey because I
am using UUID as UniqueKey,
Since I am using SOLR as index engine Only and using Riak(key-value
storage) as storage engine, I dont want to do the overwrite on duplicate.
I just need to discard the duplicates.

2011/6/28 François Schiettecatte fschietteca...@gmail.com

Create a hash from the url and use that as the unique key, md5 or sha1
would probably be good enough.

Cheers

François

On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:

On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:

have you checked out the deduplication process that's available at
indexing time ? This includes a fuzzy hash algorithm .

http://wiki.apache.org/solr/Deduplication

-Simon

On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com
wrote:
This approach would definitely work is the two documents are *Exactly*
the
same. But this is very fragile. Even if one extra space has been added,
the
whole hash would change. What I am really looking for is some %age
similarity between documents, and remove those documents which are more
than
95% similar.

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:

What you need to do, is to calculate some HASH (using any message
digest
algorithm you want, md5, sha-1 and so on), then do some reading on
solr
field collapse capabilities. Should not be too complicated..

*Omri Cohen*

Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
+972-3-6036295

My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
[image:
Twitter] http://www.twitter.com/omricohe [image:
WordPress]http://omricohen.me
Please consider your environmental responsibility. Before printing
this
e-mail message, ask yourself whether you really need a hard copy.
IMPORTANT: The contents of this email and any attachments are
confidential.
They are intended for the named recipient(s) only. If you have
received
this
email by mistake, please notify the sender immediately and do not
disclose
the contents to anyone or make copies thereof.
Signature powered by

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

WiseStamp

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

-- Forwarded message --
From: Pranav Prakash pra...@gmail.com
Date: Thu, Jun 23, 2011 at 12:26 PM
Subject: Removing duplicate documents from search results
To: solr-user@lucene.apache.org

How can I remove very similar documents from search results?

My scenario is that there are documents in the index which are almost
similar (people submitting same stuff multiple times, sometimes
different
people submitting same stuff). Now when a search is performed for
keyword,
in the top N results, quite frequently, same document comes up
multiple
times. I want to remove those duplicate (or possible duplicate)
documents.
Very similar to what Google does when they say In order to show you
most
relevant result, duplicates have been removed. How can I achieve this
functionality using Solr? Does Solr has an implied or plugin which
could
help me with it?

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com

|
Google http://www.google.com/profiles/pranny

--
Thanks and Regards
Mohammad Shariq

Re: Default schema - 'keywords' not multivalued

2011-06-28 Thread Tod


On 06/27/2011 11:23 AM, lee carroll wrote:

Hi Tod,
A list of keywords would be fine in a non multi valued field:

keywords : xxx yyy sss aaa 

multi value field would allow you to repeat the field when indexing

keywords: xxx
keywords: yyy
keywords: sss
etc



Thanks Lee. the problem is I'm manually pushing a document (via 
stream.url) and its metadata from a database with the Solr 
/update/extract REST service, HTTP GET, using Perl.


I'm streaming over the document content (presumably via tika) and its 
gathering the document's metadata which includes the keywords metadata 
field.  Since I'm also passing that field from the DB to the REST call 
as a list (as you suggested) there is a collision because the keywords 
field is single valued.


I can change this behavior using a copy field.  What I wanted to know is 
if there was a specific reason the default schema defined a field like 
keywords single valued so I could make sure I wasn't missing something 
before I changed things.


While I'm at it, I'd REALLY like to know how to use DIH to index the 
metadata from the database while simultaneously streaming over the 
document content and indexing it.  I've never quite figured it out yet 
but I have to believe it is a possibility.



- Tod

Re: Find results with or without whitespace

2011-06-28 Thread Frankie

Thank you for your answer.

I agree, I can manage predictable values through synonyms.

However most data in this index are company and product names, leading
sometimes to rather strange syntax (mix of upper/lower case, misplaced dash
or spaces). One purpose to using solr was to help in finding potential
duplicates before data insertion.

On another hand I could write a custom tokenizer/filter and a custom query
builder that would test many combinations. I have the feeling however it is
an inefficient approach.
That is...
Indexing : chelsea soccer club =
chelsea,soccer,club,chelseasoccer,soccerclub,chelseasoccerclub
Searching : chelsea soccerclub = chelsea and soccerclub or
chelseasoccerclub
While search expressions are generally short, indexation will be a
nightmare...


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Find-results-with-or-without-whitespace-tp3117144p3117581.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Removing duplicate documents from search results

Maybe there is a way to get Solr to reject documents that already exist in the
index but I doubt it, maybe someone else with can chime here here. You could do
a search for each document prior to indexing it so see if it is already in the
index, that is probably non-optimal, maybe it is easiest to check if the
document exists in your Riak repository, it no add it and index it, and drop if
it already exists.

François

On Jun 28, 2011, at 8:24 AM, Mohammad Shariq wrote:

2011/6/28 François Schiettecatte fschietteca...@gmail.com

Create a hash from the url and use that as the unique key, md5 or sha1
would probably be good enough.

Cheers

François

On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:

On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:

have you checked out the deduplication process that's available at
indexing time ? This includes a fuzzy hash algorithm .

http://wiki.apache.org/solr/Deduplication

-Simon

On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com
wrote:
This approach would definitely work is the two documents are *Exactly*
the
same. But this is very fragile. Even if one extra space has been added,
the
whole hash would change. What I am really looking for is some %age
similarity between documents, and remove those documents which are more
than
95% similar.

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:

What you need to do, is to calculate some HASH (using any message
digest
algorithm you want, md5, sha-1 and so on), then do some reading on
solr
field collapse capabilities. Should not be too complicated..

*Omri Cohen*

Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
+972-3-6036295

My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
[image:
Twitter] http://www.twitter.com/omricohe [image:
WordPress]http://omricohen.me
Please consider your environmental responsibility. Before printing
this
e-mail message, ask yourself whether you really need a hard copy.
IMPORTANT: The contents of this email and any attachments are
confidential.
They are intended for the named recipient(s) only. If you have
received
this
email by mistake, please notify the sender immediately and do not
disclose
the contents to anyone or make copies thereof.
Signature powered by

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

WiseStamp

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

-- Forwarded message --
From: Pranav Prakash pra...@gmail.com
Date: Thu, Jun 23, 2011 at 12:26 PM
Subject: Removing duplicate documents from search results
To: solr-user@lucene.apache.org

How can I remove very similar documents from search results?

My scenario is that there are documents in the index which are almost
similar (people submitting same stuff multiple times, sometimes
different
people submitting same stuff). Now when a search is performed for
keyword,
in the top N results, quite frequently, same document comes up
multiple
times. I want to remove those duplicate (or possible duplicate)
documents.
Very similar to what Google does when they say In order to show you
most
relevant result, duplicates have been removed. How can I achieve this
functionality using Solr? Does Solr has an implied or plugin which
could
help me with it?

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com

|
Google http://www.google.com/profiles/pranny

--
Thanks and Regards
Mohammad Shariq

Re: Removing duplicate documents from search results

I found the deduplication thing really useful. Although I have not yet
started to work on it, as there are some other low hanging fruits I've to
capture. Will share my thoughts soon.

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

2011/6/28 François Schiettecatte fschietteca...@gmail.com

Maybe there is a way to get Solr to reject documents that already exist in
the index but I doubt it, maybe someone else with can chime here here. You
could do a search for each document prior to indexing it so see if it is
already in the index, that is probably non-optimal, maybe it is easiest to
check if the document exists in your Riak repository, it no add it and index
it, and drop if it already exists.

François

On Jun 28, 2011, at 8:24 AM, Mohammad Shariq wrote:

I am making the Hash from URL, but I can't use this as UniqueKey because
I
am using UUID as UniqueKey,
Since I am using SOLR as index engine Only and using Riak(key-value
storage) as storage engine, I dont want to do the overwrite on duplicate.
I just need to discard the duplicates.

2011/6/28 François Schiettecatte fschietteca...@gmail.com

Create a hash from the url and use that as the unique key, md5 or sha1
would probably be good enough.

Cheers

François

On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:

I also have the problem of duplicate docs.
I am indexing news articles, Every news article will have the source
URL,
If two news-article has the same URL, only one need to index,
removal of duplicate at index time.

On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:

have you checked out the deduplication process that's available at
indexing time ? This includes a fuzzy hash algorithm .

http://wiki.apache.org/solr/Deduplication

-Simon

On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com
wrote:
This approach would definitely work is the two documents are
*Exactly*
the
same. But this is very fragile. Even if one extra space has been
added,
the
whole hash would change. What I am really looking for is some %age
similarity between documents, and remove those documents which are
more
than
95% similar.

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:

What you need to do, is to calculate some HASH (using any message
digest
algorithm you want, md5, sha-1 and so on), then do some reading on
solr
field collapse capabilities. Should not be too complicated..

*Omri Cohen*

Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
+972-3-6036295

My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
[image:
Twitter] http://www.twitter.com/omricohe [image:
WordPress]http://omricohen.me
Please consider your environmental responsibility. Before printing
this
e-mail message, ask yourself whether you really need a hard copy.
IMPORTANT: The contents of this email and any attachments are
confidential.
They are intended for the named recipient(s) only. If you have
received
this
email by mistake, please notify the sender immediately and do not
disclose
the contents to anyone or make copies thereof.
Signature powered by

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

WiseStamp

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

-- Forwarded message --
From: Pranav Prakash pra...@gmail.com
Date: Thu, Jun 23, 2011 at 12:26 PM
Subject: Removing duplicate documents from search results
To: solr-user@lucene.apache.org

How can I remove very similar documents from search results?

My scenario is that there are documents in the index which are
almost
similar (people submitting same stuff multiple times, sometimes
different
people submitting same stuff). Now when a search is performed for
keyword,
in the top N results, quite frequently, same document comes up
multiple
times. I want to remove those duplicate (or possible duplicate)
documents.
Very similar to what Google does when they say In order to show you
most
relevant result, duplicates have been removed. How can I achieve
this
functionality using Solr? Does Solr has an implied or plugin which
could
help me with it?

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com

|
Google http://www.google.com/profiles/pranny

--
Thanks and Regards
Mohammad Shariq

Re: Analyzer creates PhraseQuery

2011-06-28 Thread Koji Sekiguchi


(11/06/28 16:40), lboutros wrote:

You could add this filter after the NGram filter to prevent the phrase query
creation :

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory

Ludovic.


There is an option to avoid producing phrase queries, 
autoGeneratePhraseQueries=false.

koji
--
http://www.rondhuit.com/en/

Re: Removing duplicate documents from search results

Indeed, take a look at this:

http://wiki.apache.org/solr/Deduplication

I have not used it but it looks like it will do the trick.

François

On Jun 28, 2011, at 8:44 AM, Pranav Prakash wrote:

I found the deduplication thing really useful. Although I have not yet
started to work on it, as there are some other low hanging fruits I've to
capture. Will share my thoughts soon.

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

2011/6/28 François Schiettecatte fschietteca...@gmail.com

Maybe there is a way to get Solr to reject documents that already exist in
the index but I doubt it, maybe someone else with can chime here here. You
could do a search for each document prior to indexing it so see if it is
already in the index, that is probably non-optimal, maybe it is easiest to
check if the document exists in your Riak repository, it no add it and index
it, and drop if it already exists.

François

On Jun 28, 2011, at 8:24 AM, Mohammad Shariq wrote:

I am making the Hash from URL, but I can't use this as UniqueKey because
I
am using UUID as UniqueKey,
Since I am using SOLR as index engine Only and using Riak(key-value
storage) as storage engine, I dont want to do the overwrite on duplicate.
I just need to discard the duplicates.

2011/6/28 François Schiettecatte fschietteca...@gmail.com

Create a hash from the url and use that as the unique key, md5 or sha1
would probably be good enough.

Cheers

François

On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:

I also have the problem of duplicate docs.
I am indexing news articles, Every news article will have the source
URL,
If two news-article has the same URL, only one need to index,
removal of duplicate at index time.

On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:

have you checked out the deduplication process that's available at
indexing time ? This includes a fuzzy hash algorithm .

http://wiki.apache.org/solr/Deduplication

-Simon

On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com
wrote:
This approach would definitely work is the two documents are
*Exactly*
the
same. But this is very fragile. Even if one extra space has been
added,
the
whole hash would change. What I am really looking for is some %age
similarity between documents, and remove those documents which are
more
than
95% similar.

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:

What you need to do, is to calculate some HASH (using any message
digest
algorithm you want, md5, sha-1 and so on), then do some reading on
solr
field collapse capabilities. Should not be too complicated..

*Omri Cohen*

Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
+972-3-6036295

My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
[image:
Twitter] http://www.twitter.com/omricohe [image:
WordPress]http://omricohen.me
Please consider your environmental responsibility. Before printing
this
e-mail message, ask yourself whether you really need a hard copy.
IMPORTANT: The contents of this email and any attachments are
confidential.
They are intended for the named recipient(s) only. If you have
received
this
email by mistake, please notify the sender immediately and do not
disclose
the contents to anyone or make copies thereof.
Signature powered by

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

WiseStamp

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

-- Forwarded message --
From: Pranav Prakash pra...@gmail.com
Date: Thu, Jun 23, 2011 at 12:26 PM
Subject: Removing duplicate documents from search results
To: solr-user@lucene.apache.org

How can I remove very similar documents from search results?

My scenario is that there are documents in the index which are
almost
similar (people submitting same stuff multiple times, sometimes
different
people submitting same stuff). Now when a search is performed for
keyword,
in the top N results, quite frequently, same document comes up
multiple
times. I want to remove those duplicate (or possible duplicate)
documents.
Very similar to what Google does when they say In order to show you
most
relevant result, duplicates have been removed. How can I achieve
this
functionality using Solr? Does Solr has an implied or plugin which
could
help me with it?

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com

|
Google http://www.google.com/profiles/pranny

--
Thanks and Regards
Mohammad Shariq

Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq

Hey François,
thanks for your suggestion, I followed the same link (
http://wiki.apache.org/solr/Deduplication)

they have the solution*, either make Hash as uniqueKey OR overwrite on
duplicate,
I dont need either.

I need Discard on Duplicate.
*

I have not used it but it looks like it will do the trick.

François

On Jun 28, 2011, at 8:44 AM, Pranav Prakash wrote:

I found the deduplication thing really useful. Although I have not yet
started to work on it, as there are some other low hanging fruits I've to
capture. Will share my thoughts soon.

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

2011/6/28 François Schiettecatte fschietteca...@gmail.com

Maybe there is a way to get Solr to reject documents that already exist
in
the index but I doubt it, maybe someone else with can chime here here.
You
could do a search for each document prior to indexing it so see if it is
already in the index, that is probably non-optimal, maybe it is easiest
to
check if the document exists in your Riak repository, it no add it and
index
it, and drop if it already exists.

François

On Jun 28, 2011, at 8:24 AM, Mohammad Shariq wrote:

I am making the Hash from URL, but I can't use this as UniqueKey
because
I
am using UUID as UniqueKey,
Since I am using SOLR as index engine Only and using Riak(key-value
storage) as storage engine, I dont want to do the overwrite on
duplicate.
I just need to discard the duplicates.

2011/6/28 François Schiettecatte fschietteca...@gmail.com

Create a hash from the url and use that as the unique key, md5 or sha1
would probably be good enough.

Cheers

François

On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:

I also have the problem of duplicate docs.
I am indexing news articles, Every news article will have the source
URL,
If two news-article has the same URL, only one need to index,
removal of duplicate at index time.

On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:

have you checked out the deduplication process that's available at
indexing time ? This includes a fuzzy hash algorithm .

http://wiki.apache.org/solr/Deduplication

-Simon

On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com
wrote:
This approach would definitely work is the two documents are
*Exactly*
the
same. But this is very fragile. Even if one extra space has been
added,
the
whole hash would change. What I am really looking for is some %age
similarity between documents, and remove those documents which are
more
than
95% similar.

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:

What you need to do, is to calculate some HASH (using any message
digest
algorithm you want, md5, sha-1 and so on), then do some reading on
solr
field collapse capabilities. Should not be too complicated..

*Omri Cohen*

Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
+972-3-6036295

My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
[image:
Twitter] http://www.twitter.com/omricohe [image:
WordPress]http://omricohen.me
Please consider your environmental responsibility. Before printing
this
e-mail message, ask yourself whether you really need a hard copy.
IMPORTANT: The contents of this email and any attachments are
confidential.
They are intended for the named recipient(s) only. If you have
received
this
email by mistake, please notify the sender immediately and do not
disclose
the contents to anyone or make copies thereof.
Signature powered by

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

WiseStamp

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

-- Forwarded message --
From: Pranav Prakash pra...@gmail.com
Date: Thu, Jun 23, 2011 at 12:26 PM
Subject: Removing duplicate documents from search results
To: solr-user@lucene.apache.org

How can I remove very similar documents from search results?

My scenario is that there are documents in the index which are
almost
similar (people submitting same stuff multiple times, sometimes
different
people submitting same stuff). Now when a search is performed for
keyword,
in the top N results, quite frequently, same document comes up
multiple
times. I want to remove those duplicate (or possible duplicate)
documents.
Very similar to what Google does when they say In order to show
you
most
relevant result, duplicates have been removed. How can I achieve
this
functionality using Solr? Does

Re: Include synonys in solr

Thanks François Schiettecatte, information you provided is very helpful.
i need to know one more thing, i downloaded one of the given dictionary but
it contains many files, do i need to add all this files data in to
synonyms.text ??

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Include-synonyms-in-solr-tp3116836p3117733.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Removing duplicate documents from search results

2011-06-28 Thread Paul Libbrecht

Mohammad,

just in case you meant it, I would like to discourage you to try to deduplicate
*the search result*.
There are many things that go wrong if you do that; we had it in one version of
the ActiveMath search environment (which uses Lucene):
- paging is inappropriate
- total count is wrong unless you go through all the results
- performance can go really bad if you try to go through all the results
- performance does go bad for some search results if you try to fill the page
(need to fetch till you find)
- you to go through all search results again and again when delivering the next
ones

So, as others have suggested, please be sure to deduplicate somehow at indexing
time.

paul

Le 28 juin 2011 à 14:24, Mohammad Shariq a écrit :

2011/6/28 François Schiettecatte fschietteca...@gmail.com

Create a hash from the url and use that as the unique key, md5 or sha1
would probably be good enough.

Cheers

François

On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:

On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:

have you checked out the deduplication process that's available at
indexing time ? This includes a fuzzy hash algorithm .

http://wiki.apache.org/solr/Deduplication

-Simon

On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com
wrote:
This approach would definitely work is the two documents are *Exactly*
the
same. But this is very fragile. Even if one extra space has been added,
the
whole hash would change. What I am really looking for is some %age
similarity between documents, and remove those documents which are more
than
95% similar.

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:

What you need to do, is to calculate some HASH (using any message
digest
algorithm you want, md5, sha-1 and so on), then do some reading on
solr
field collapse capabilities. Should not be too complicated..

*Omri Cohen*

Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
+972-3-6036295

My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
[image:
Twitter] http://www.twitter.com/omricohe [image:
WordPress]http://omricohen.me
Please consider your environmental responsibility. Before printing
this
e-mail message, ask yourself whether you really need a hard copy.
IMPORTANT: The contents of this email and any attachments are
confidential.
They are intended for the named recipient(s) only. If you have
received
this
email by mistake, please notify the sender immediately and do not
disclose
the contents to anyone or make copies thereof.
Signature powered by

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

WiseStamp

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

-- Forwarded message --
From: Pranav Prakash pra...@gmail.com
Date: Thu, Jun 23, 2011 at 12:26 PM
Subject: Removing duplicate documents from search results
To: solr-user@lucene.apache.org

How can I remove very similar documents from search results?

My scenario is that there are documents in the index which are almost
similar (people submitting same stuff multiple times, sometimes
different
people submitting same stuff). Now when a search is performed for
keyword,
in the top N results, quite frequently, same document comes up
multiple
times. I want to remove those duplicate (or possible duplicate)
documents.
Very similar to what Google does when they say In order to show you
most
relevant result, duplicates have been removed. How can I achieve this
functionality using Solr? Does Solr has an implied or plugin which
could
help me with it?

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com

|
Google http://www.google.com/profiles/pranny

--
Thanks and Regards
Mohammad Shariq

Re: Removing duplicate documents from search results

Yeah, I read the overview which suggests that duplicates can be prevented from
entering the index and scanned the rest, it does not look like you can actually
drop the document entirely. Maybe I am missing something here.

François

On Jun 28, 2011, at 9:14 AM, Mohammad Shariq wrote:

Hey François,
thanks for your suggestion, I followed the same link (
http://wiki.apache.org/solr/Deduplication)

they have the solution*, either make Hash as uniqueKey OR overwrite on
duplicate,
I dont need either.

I need Discard on Duplicate.
*

I have not used it but it looks like it will do the trick.

François

On Jun 28, 2011, at 8:44 AM, Pranav Prakash wrote:

I found the deduplication thing really useful. Although I have not yet
started to work on it, as there are some other low hanging fruits I've to
capture. Will share my thoughts soon.

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

2011/6/28 François Schiettecatte fschietteca...@gmail.com

Maybe there is a way to get Solr to reject documents that already exist
in
the index but I doubt it, maybe someone else with can chime here here.
You
could do a search for each document prior to indexing it so see if it is
already in the index, that is probably non-optimal, maybe it is easiest
to
check if the document exists in your Riak repository, it no add it and
index
it, and drop if it already exists.

François

On Jun 28, 2011, at 8:24 AM, Mohammad Shariq wrote:

I am making the Hash from URL, but I can't use this as UniqueKey
because
I
am using UUID as UniqueKey,
Since I am using SOLR as index engine Only and using Riak(key-value
storage) as storage engine, I dont want to do the overwrite on
duplicate.
I just need to discard the duplicates.

2011/6/28 François Schiettecatte fschietteca...@gmail.com

Create a hash from the url and use that as the unique key, md5 or sha1
would probably be good enough.

Cheers

François

On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:

I also have the problem of duplicate docs.
I am indexing news articles, Every news article will have the source
URL,
If two news-article has the same URL, only one need to index,
removal of duplicate at index time.

On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:

have you checked out the deduplication process that's available at
indexing time ? This includes a fuzzy hash algorithm .

http://wiki.apache.org/solr/Deduplication

-Simon

On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com
wrote:
This approach would definitely work is the two documents are
*Exactly*
the
same. But this is very fragile. Even if one extra space has been
added,
the
whole hash would change. What I am really looking for is some %age
similarity between documents, and remove those documents which are
more
than
95% similar.

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog
http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:

What you need to do, is to calculate some HASH (using any message
digest
algorithm you want, md5, sha-1 and so on), then do some reading on
solr
field collapse capabilities. Should not be too complicated..

*Omri Cohen*

Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
+972-3-6036295

My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
[image:
Twitter] http://www.twitter.com/omricohe [image:
WordPress]http://omricohen.me
Please consider your environmental responsibility. Before printing
this
e-mail message, ask yourself whether you really need a hard copy.
IMPORTANT: The contents of this email and any attachments are
confidential.
They are intended for the named recipient(s) only. If you have
received
this
email by mistake, please notify the sender immediately and do not
disclose
the contents to anyone or make copies thereof.
Signature powered by

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

WiseStamp

http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer

-- Forwarded message --
From: Pranav Prakash pra...@gmail.com
Date: Thu, Jun 23, 2011 at 12:26 PM
Subject: Removing duplicate documents from search results
To: solr-user@lucene.apache.org

How can I remove very similar documents from search results?

Re: Include synonys in solr

Well no, you need to see which files (if any) will suit your needs, they are 
not all synonyms files, I only needed the UK/US english file and I needed to 
process it into a format suitable for the synonyms file.

There may well be other word lists on the net suitable for your needs. I would 
not recommend the use of synonyms unless you have a specific need for them. I 
needed them because we have documents which mix UK/US english, and we need to 
be able to search on medical terms e.g. hemoglobin/haemoglobin and get the same 
results.

Cheers 

François

On Jun 28, 2011, at 9:21 AM, Romi wrote:

 Thanks François Schiettecatte, information you provided is very helpful.
 i need to know one more thing, i downloaded one of the given dictionary but
 it contains many files, do i need to add all this files data in to
 synonyms.text ??
 
 -
 Thanks  Regards
 Romi
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Include-synonyms-in-solr-tp3116836p3117733.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple spatial values

2011-06-28 Thread Smiley, David W.

It is precisely this limitation which triggered me to develop a grid indexing 
approach using Geohashes: https://issues.apache.org/jira/browse/SOLR-2155
This patch requires a Solr trunk release.

If you have a small number of distinct points in total, and you only need 
filtering, then the geohash field in Solr 3.1 may be fast enough for you.

~ David Smiley

On Jun 28, 2011, at 7:53 AM, Darren Govoni wrote:

 Will it be possible to do spatial searches on multi-valued spatial 
 fields soon?
 
 I have a latlon field (point) that is multi-valued and don't know how to 
 search against it
 such that the lats and lons match correctly - since they are split apart.
 
 e.g. I have a document with 10 point/latlon values for the same field.
 
 On 06/28/2011 05:15 AM, marthinal wrote:
 Yonik Seeley-2-2 wrote:
 On Sat, Jun 25, 2011 at 5:56 AM, marthinal
 lt;jm.rodriguez.ve...@gmail.comgt; wrote:
 sfield, pt and d can all be specified directly in the spatial
 functions/filters too, and that will override the global params.
 
 Unfortunately one must currently use lucene query syntax to do an OR.
 It just makes it look a bit messier.
 
 q=_query_:{!geofilt} _query:{!geofilt sfield=location_2}
 
 -Yonik
 http://www.lucidimagination.com
 
 @Yonik it seems to work like this, i triyed houndreds of other
 possibilities
 without success:
 
 q={!geofilt sfield=location_1 pt=36.62,-6.23 d=50}fq={!geofilt
 sfield=location_2 pt=40.51,-5.91 d=500}
 Ah, right.  I had thought you wanted docs that matched either geofilt
 (hence OR), not docs that only matched both.
 
 -Yonik
 http://www.lucidimagination.com
 
 Yes Yonik what i do now is
 
 q={!geofilt sfield=location_1 pt=36.62,-6.23 d=50}fq=_query_:{!geofilt
 sfield=location_2 pt=40.51,-5.91 d=500} other_filter:value ..
 
 I write here the query because maybe it *helps* to someone that need to do
 something like this ...
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/multiple-spatial-values-tp1555668p3117145.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index Version and Epoch Time?

2011-06-28 Thread Shalin Shekhar Mangar

On Tue, Jun 28, 2011 at 4:18 PM, Pranav Prakash pra...@gmail.com wrote:


 I am not sure what is the index number value? It looks like an epoch time,
 but in my case, this points to one month back. However, i can see documents
 which were added last week, to be in the index.


The index version shown on the dashboard is the time at which the most
recent index segment was created. I'm not sure why it has a value older than
a month if a commit has happened after that time.


 Even after I did a commit, the index number did not change? Isn't it
 supposed to change on every commit? If not, is there a way to look into the
 last index time?


Yeah, it changes after every commit which added/deleted a document.


 Also, this page
 http://wiki.apache.org/solr/SolrReplication#Replication_Dashboard shows a
 Replication Dashboard. How is this dashboard invoked? Is there any URL
 which
 needs to be called?


If you have configured replication correctly, the admin dashboard should
show a Replication link right next to the Schema Browser link. The path
should be /admin/replication/index.jsp

-- 
Regards,
Shalin Shekhar Mangar.

Using FieldCache in SolrIndexSearcher - crazy idea?

2011-06-28 Thread Michael Ryan

I am a user of Solr 3.2 and I make use of the distributed search capabilities 
of Solr using a fairly simple architecture of a coordinator + some shards.

Correct me if I am wrong:  In a standard distributed search with 
QueryComponent, the first query sent to the shards asks for fl=myUniqueKey or 
fl=myUniqueKey,score.  When the response is being generated to send back to the 
coordinator, SolrIndexSearcher.doc (int i, SetString fields) is called for 
each document.  As I understand it, this will read each document from the index 
_on disk_ and retrieve the myUniqueKey field value for each document.

My idea is to have a FieldCache for the myUniqueKey field in SolrIndexSearcher 
(or somewhere else?) that would be used in cases where the only field that 
needs to be retrieved is myUniqueKey.  Is this something that would improve 
performance?

In our actual setup, we are using an extended version of QueryComponent that 
queries for a couple other fields besides myUniqueKey in the initial query to 
the shards, and it asks a lot of rows when doing so, many more than what the 
user ends up getting back when they see the results.  (The reasons for this are 
complicated and aren't related much to this question.)  We already maintain 
FieldCaches for the fields that we are asking for, but for other purposes.  
Would it make sense to utilize these FieldCaches in SolrIndexSearcher?  Is this 
something that anyone else has done before?

-Michael

Records disappearing

2011-06-28 Thread Brian Lamb

Hi all,

I'm having some weird behavior with my dataimport script. Because of memory
issues, I've taken to doing a delta import as doing a fullimport with
clean=false. My dataimport config file is set up like:

entity name=findDelta query=SELECT id FROM mytable WHERE date_added gt;
'${dataimporter.last_index_time}' OR last_updated gt;
'${dataimporter.last_index_time}' rootEntity=false
  entity name=mytable
 pk=id
 query=SELECT * FROM mytable WHERE id = '${findDelta.id}'
 deletedPkQuery=SELECT id FROM my_delete_table
 deltaImportQuery=SELECT id FROM mytable WHERE id='${
dataimporter.delta.id}'
 deltaQuery=SELECT id FROM mytable WHERE date_added gt;
'${dataimporter.last_index_time}' OR last_updated gt;
'${dataimporter.last_index_time}'
field column=id name=id /
field column=title name=title /
field column=name name=name /
field column=summary name=summary /
  /entity
/entity

I've found that one (possible more that I haven't noticed) keeps
disappearing from the index. I will do a fullimportclean=false and search
and the record will be there. I'll search again a few hours later and its
there. But then all of a sudden, its gone. I don't know what is triggering
that one record's disappearance but it is quite annoying. Any ideas what's
going on?

Thanks,

Brian Lamb

Re: Default schema - 'keywords' not multivalued

2011-06-28 Thread Chris Hostetter


: I'm streaming over the document content (presumably via tika) and its
: gathering the document's metadata which includes the keywords metadata field.
: Since I'm also passing that field from the DB to the REST call as a list (as
: you suggested) there is a collision because the keywords field is single
: valued.
: 
: I can change this behavior using a copy field.  What I wanted to know is if
: there was a specific reason the default schema defined a field like keywords
: single valued so I could make sure I wasn't missing something before I changed
: things.

That file is just an example, you're absolutely free to change it to meet 
your use case.

I'm not very familiar with Tika, but based on the comment in the example 
config...

   !-- Common metadata fields, named specifically to match up with
 SolrCell metadata when parsing rich documents such as Word, PDF.
 Some fields are multiValued only because Tika currently may return
 multiple values for them.
   --

...i suspect it was intentional that that field is *not* multiValued (i 
guess Tika always returns a single delimited value?) but if you have 
multiple descrete values you want to send for your DB backed data there is 
no downside to changing that.

: While I'm at it, I'd REALLY like to know how to use DIH to index the metadata
: from the database while simultaneously streaming over the document content and
: indexing it.  I've never quite figured it out yet but I have to believe it is
: a possibility.

There's a TikaEntityProcessor that can be used to have Tika crunch the 
data that comes from an entity and extract out specific fields, and it 
can be used in combination with a JdbcDataSource and a BinFileDataSource 
so that a field in your db data specifies the name of a file on disk to 
use as the TikaEntity -- but i've personally never tried it

Here's a simple example someone posted last year that they got working...

http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html



-Hoss

Does Smart Chinese filter work for Traditional Chinese?

2011-06-28 Thread Andy

Hi,

According to the doc:

http://wiki.apache.org/solr/LanguageAnalysis#Chinese.2C_Japanese.2C_Korean

solr.SmartChineseWordTokenFilterFactory is for Simplified Chinese.

Does it work for Traditional Chinese too? If not, is there anything equivalent 
for Traditional Chinese?

Thanks.

Re: Analyzer creates PhraseQuery

2011-06-28 Thread entdeveloper

Thanks guys. Both the PositionFilterFactory and the
autoGeneratePhraseQueries=false solutions solved the issue.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Analyzer-creates-PhraseQuery-tp3116288p3118471.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index Version and Epoch Time?