Re: using PositionIncrementAttribute to increment certain term positions to large values

2012-12-27 Thread Dmitry Kan
Hi,

answering my own question for the records: the experiments show that the
described functionality is achievable with the TokenFilter class
implementation. The only caveat though, is that Highlighter component stops
working properly, if the match position goes beyond the length of the text
field.

As for the performance, no major delays compared to the original proximity
search implementation have been noticed.

Best,

Dmitry Kan

On Wed, Dec 19, 2012 at 10:53 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Dear list,

 We are currently evaluating proximity searches (term1 term2 ~slope) for
 a specific use case. In particular, each document contains artificial
 delimiter characters (one character between each pair of sentences in the
 text). Our goal is to hit the sentences individually for any proximity
 search and avoid sentence cross-boundary matches.

 We figured, that by using PositionIncrementAttribute as a field in the
 descendant of TokenFilter class it is possible to set a position
 increment of each artificial character (which is a term in Lucene / SOLR
 notation) to an arbitrarily large number. Thus any proximity searches with
 reasonably small slope values should automatically hit withing the sentence
 boundaries.

 Does this sound like a right way to tackle the problem? Are there any
 performance costs involved?

 Thanks in advance for any input,

 Dmitry Kan



Re: Which token filter can combine 2 terms into 1?

2012-12-27 Thread Dmitry Kan
Hi,

Have a look onto TokenFilter. Extending it will give you access to a
TokenStream.

Regards,

Dmitry Kan

On Fri, Dec 21, 2012 at 9:05 AM, Xi Shen davidshe...@gmail.com wrote:

 Hi,

 I am looking for a token filter that can combine 2 terms into 1? E.g.

 the input has been tokenized by white space:

 t1 t2 t2a t3

 I want a filter that output:

 t1 t2t2a t3

 I know it is a very special case, and I am thinking about develop a filter
 of my own. But I cannot figure out which API I should use to look for terms
 in a Token Stream.


 --
 Regards,
 David Shen

 http://about.me/davidshen
 https://twitter.com/#!/davidshen84



search with spaces

2012-12-27 Thread Sangeetha
Hi,

I have a text field with value O O Jaane Jaane. When i search with *q=Jaane
Jaane* it is giving the results. But if i give *q=O O Jaane Jaane* it is not
working? What could be the reason?

Thanks,
Sangeetha



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-with-spaces-tp4029265.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search with spaces

2012-12-27 Thread Chandan Tamrakar
Which Analyzer is being used in the field that was indexed ?
May be you can use solradmin to analyze and see how is your index

thanks

On Thu, Dec 27, 2012 at 2:30 PM, Sangeetha sangeetha...@gmail.com wrote:

 Hi,

 I have a text field with value O O Jaane Jaane. When i search with *q=Jaane
 Jaane* it is giving the results. But if i give *q=O O Jaane Jaane* it is
 not
 working? What could be the reason?

 Thanks,
 Sangeetha



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/search-with-spaces-tp4029265.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Chandan Tamrakar
*
*


solr + jetty deployment issue

2012-12-27 Thread Sushrut Bidwai
Hi,

I am having trouble with getting solr + jetty to work. I am following all
instructions to the letter from - http://wiki.apache.org/solr/SolrJetty. I
also created a work folder - /opt/solr/work. I am also setting tmpdir to a
new path in /etc/default/jetty . I am confirming the tmpdir is set to the
new path from admin dashboard, under args.

It works like a charm. But when I restart jetty multiple times, after 3/4
such restarts it starts hanging. Admin pages just dont load and my app
fails to acquire a connection with solr.

What I might be missing? Should I be rather looking at my code and see if I
am not committing correctly?

Please let me know if you have faced similar issue in the past and how to
tackle it.

Thank you.

-- 
Best Regards,
Sushrut


Re: Reindex ALL Solr CORES in one GO..

2012-12-27 Thread Anupam Bhattacharya
Thanks Gora,

I can definitely trigger the full re-indexing using CURL for multiple cores
although if i try to index multiple cores (more than 4-5 cores)
simultaneously then the re-indexing fails due to DB connection pool
problems( Connection not available ). Thus I need to schedule indexing once
the previous indexing is over. Unfortunately to track the status of
indexing for a core one need to keeping pinging the server to check
completion status. Is there a way to get a response from SOLR once the
indexing is complete ?

How can i increase the connection pool size in SOLR ?

Regards
Anupam


On Wed, Dec 26, 2012 at 7:06 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 26 December 2012 18:06, Anupam Bhattacharya anupam...@gmail.com
 wrote:
  Hello Everyone,
 
  Is it possible to schedule full reindexing of all solr cores without
 going
  to individually to the DIH screen of each core ?

 One could quite easily write a wrapper around Solr's
 URLs for indexing. You could use a tool like curl, a
 simple shell script, or pretty much any programming
 language to do this.

 Regards,
 Gora




-- 
Thanks  Regards
Anupam Bhattacharya


Re: Reindex ALL Solr CORES in one GO..

2012-12-27 Thread Ahmet Arslan
 Unfortunately to track the
 status of
 indexing for a core one need to keeping pinging the server
 to check
 completion status. Is there a way to get a response from
 SOLR once the
 indexing is complete ?

Yes it is possible : 
http://wiki.apache.org/solr/DataImportHandler#EventListeners


Re: Dynamic collections in SolrCloud for log indexing

2012-12-27 Thread Otis Gospodnetic
Added https://issues.apache.org/jira/browse/SOLR-4237

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html



On Tue, Dec 25, 2012 at 9:13 PM, Mark Miller markrmil...@gmail.com wrote:

 I've been thinking about aliases for a while as well. Seem very handy and
 fairly easy to implement. So far there has just always been higher priority
 things (need to finish collection api responses this week…) but this is
 something I'd def help work on.

 - Mark

 On Dec 25, 2012, at 1:49 AM, Otis Gospodnetic otis.gospodne...@gmail.com
 wrote:

  Hi,
 
  Right, this is not really about routing in ElasticSearch-sense.
  What's handy for indexing logs are index aliases which I thought I
 had
  added to JIRA a while back, but it looks like I have not.
  Index aliases would let you keep a last 7 days alias fixed while
  underneath you push and pop an index every day without the client app
  having to adjust.
 
  Otis
  --
  Performance Monitoring - http://sematext.com/spm/index.html
  Search Analytics - http://sematext.com/search-analytics/index.html
 
 
 
  On Mon, Dec 24, 2012 at 4:30 AM, Per Steffensen st...@designware.dk
 wrote:
 
  I believe it is a misunderstandig to use custom routing (or sharding as
  Erick calls it) for this kind of stuff. Custom routing is nice if you
 want
  to control which slice/shard under a collection a specific document
 goes to
  - mainly to be able to control that two (or more) documents are indexed
 on
  the same slice/shard, but also just to be able to control on which
  slice/shard a specific document is indexed. Knowing/controlling this
 kind
  of stuff can be used for a lot of nice purposes. But you dont want to
 move
  slices/shards around among collection or delete/add slices from/to a
  collection - unless its for elasticity reasons.
 
  I think you should fill a collection every week/month and just keep
 those
  collections as is. Instead of ending up with a big historic collection
  containing many slices/shards/cores (one for each historic week/month),
 you
  will end up with many historic collections (one for each historic
  week/month). Searching historic data you will have to cross-search those
  historic collections, but that is no problem at all. If Solr Cloud is
 made
  at it is supposed to be made (and I believe it is) it shouldnt require
 more
  resouces or be harder in any way to cross-search X slices across many
  collections, than it is to cross-search X slices under the same
 collection.
 
  Besides that see my answer for topic Will SolrCloud always slice by ID
  hash? a few days back.
 
  Regards, Per Steffensen
 
 
  On 12/24/12 1:07 AM, Erick Erickson wrote:
 
  I think this is one of the primary use-cases for custom sharding. Solr
 4.0
  doesn't really lend itself to this scenario, but I _believe_ that the
  patch
  for custom sharding has been committed...
 
  That said, I'm not quite sure how you drop off the old shard if you
 don't
  need to keep old data. I'd guess it's possible, but haven't implemented
  anything like that myself.
 
  FWIW,
  Erick
 
 
  On Fri, Dec 21, 2012 at 12:17 PM, Upayavira u...@odoko.co.uk wrote:
 
  I'm working on a system for indexing logs. We're probably looking at
  filling one core every month.
 
  We'll maintain a short term index containing the last 7 days - that
 one
  is easy to handle.
 
  For the longer term stuff, we'd like to maintain a collection that
 will
  query across all the historic data, but that means every month we need
  to add another core to an existing collection, which as I understand
 it
  in 4.0 is not possible.
 
  How do people handle this sort of situation where you have rolling new
  content arriving? I'm sure I've heard people using SolrCloud for this
  sort of thing.
 
  Given it is logs, distributed IDF has no real bearing.
 
  Upayavira
 
 
 




Re: Which token filter can combine 2 terms into 1?

2012-12-27 Thread Mattmann, Chris A (388J)
Hi Guys,

I also worked on a CombiningTokenFilter, see:

https://issues.apache.org/jira/browse/LUCENE-3413


Patch has been up and available for a while.

HTH!

Cheers,
Chris


On 12/27/12 12:26 AM, Dmitry Kan solrexp...@gmail.com wrote:

Hi,

Have a look onto TokenFilter. Extending it will give you access to a
TokenStream.

Regards,

Dmitry Kan

On Fri, Dec 21, 2012 at 9:05 AM, Xi Shen davidshe...@gmail.com wrote:

 Hi,

 I am looking for a token filter that can combine 2 terms into 1? E.g.

 the input has been tokenized by white space:

 t1 t2 t2a t3

 I want a filter that output:

 t1 t2t2a t3

 I know it is a very special case, and I am thinking about develop a
filter
 of my own. But I cannot figure out which API I should use to look for
terms
 in a Token Stream.


 --
 Regards,
 David Shen

 http://about.me/davidshen
 https://twitter.com/#!/davidshen84




Re: Converting fq params to Filter object

2012-12-27 Thread Nalini Kartha
Hi Lance,

Thanks for the response.

I didn't quite understand how to issue the queries from DirectSpellChecker
with the fq params applied like you were suggesting - could you point me to
the API that can be used for this?

Also, we haven't benchmarked the DirectSpellChecker against the
IndexBasedSpellChecker.

I considered issuing one large OR query with all corrections but that
doesn't ensure that *every* correction would return some hits with the fq
params applied, it only tells us that some correction returned hits so this
isn't restrictive enough for us. And ANDing the corrections together
becomes too restrictive since it requires that *all* corrections existed in
the same documents instead of checking that they individually exist in some
docs (which satisfy the filter queries of course).

Thanks,
Nalini


On Wed, Dec 26, 2012 at 9:32 PM, Lance Norskog goks...@gmail.com wrote:

 A Solr facet query does a boolean query, caches the Lucene facet data
 structure, and uses it as a Lucene filter. After that until you do a full
 commit, using the same fq=string (you must match the string exactly)
 fetches the cached data structure and uses it again as a Lucene filter.

 Have you benchmarked the DirectSpellChecker against
 IndexBasedSpellChecker? If you use the fq= filter query as the
 spellcheck.q= query it should use the cached filter.

 Also, since you are checking all words against the same filter query, can
 you just do one large OR query with all of the words?


 On 12/26/2012 03:10 PM, Nalini Kartha wrote:

 Hi Otis,

 Sorry, let me be more specific.

 The end goal is for the DirectSpellChecker to make sure that the
 corrections it is returning will return some results taking into account
 the fq params included in the original query. This is a follow up question
 to another question I had posted earlier -

 http://mail-archives.apache.**org/mod_mbox/lucene-solr-user/**
 201212.mbox/%**3CCAMqOzYFTgiWyRbvwSdF0hFZ1SZN**
 kQ9gnBJfDb_OBNeLsMvR0XA@mail.**gmail.com%3Ehttp://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3ccamqozyftgiwyrbvwsdf0hfz1sznkq9gnbjfdb_obnelsmvr...@mail.gmail.com%3E

 Initially, the way I was thinking of implementing this was to call one of
 the SolrIndexSearcher.getDocSet() methods for ever correction, passing in
 the correction as the Query and a DocSet created from the fq queries. But
 I
 didn't think that calling a SolrIndexSearcher method in Lucene code
 (DirectSpellChecker) was a good idea. So I started looking at which method
 on IndexSearcher would accomplish this. That's where I'm stuck trying to
 figure out how to convert the fq params into a Filter object.

 Does this approach make sense? Also I realize that this implementation is
 probably non-performant but wanted to give it a try and measure how it
 does. Any advice about what the perf overhead from issuing such queries
 for
 say 50 corrections would be? Note that the filter from the fq params is
 the
 same for every query - would that be cached and help speed things up?

 Thanks,
 Nalini


 On Wed, Dec 26, 2012 at 3:34 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

  Hi,

 The fq *is* for filtering.

 What is your end goal, what are you trying to achieve?

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Dec 26, 2012 11:22 AM, Nalini Kartha nalinikar...@gmail.com
 wrote:

  Hi,

 I'm trying to figure out how to convert the fq params that are being

 passed

 to Solr into something that can be used to filter the results of a query
 that's being issued against the Lucene IndexSearcher (I'm modifying some
 Lucene code to issue the query so calling through to one of the
 SolrIndexSearcher methods would be ugly).

 Looks like one of the IndexSearcher.search(Query query, Filter filter,

 ...)

   methods would do what I want but I'm wondering if there's any easy way

 of

 converting the fq params into a Filter? Or is there a better way of
 doing
 all of this?

 Thanks,
 Nalini





Re: Converting fq params to Filter object

2012-12-27 Thread Erik Hatcher
I think the answer is yes, that there's a better way to doing all of this.  But 
I'm not yet sure what this all entails in your situation.  What are you 
overriding with the Lucene searches?   I imagine Solr has the flexibility to 
handle what you're trying to do without overriding anything core in 
SolrIndexSearcher.

Generally, the way to get a custom filter in place is to create a custom query 
parser and use that for your fq parameter, like fq={!myparser param1='some 
value'}possible+expression+if+needed, so maybe that helps?

Tell us more about what you're doing specifically, and maybe we can guide you 
to a more elegant way to plug in any custom logic you want.

Erik

On Dec 26, 2012, at 11:21 , Nalini Kartha wrote:

 Hi,
 
 I'm trying to figure out how to convert the fq params that are being passed
 to Solr into something that can be used to filter the results of a query
 that's being issued against the Lucene IndexSearcher (I'm modifying some
 Lucene code to issue the query so calling through to one of the
 SolrIndexSearcher methods would be ugly).
 
 Looks like one of the IndexSearcher.search(Query query, Filter filter, ...)
 methods would do what I want but I'm wondering if there's any easy way of
 converting the fq params into a Filter? Or is there a better way of doing
 all of this?
 
 Thanks,
 Nalini



Re: Converting fq params to Filter object

2012-12-27 Thread Nalini Kartha
Hi Eric,

Sorry, I think I wasn't very clear in explaining what we need to do.

We don't really need to do any complicated overriding, just want to change
the DirectSpellChecker to issue a query for every correction it finds *with
fq params from the original query taken into account* so that we can check
if the correction would actually result in some hits.

I was thinking of implementing this using the IndexSearcher.search(Query
query, Filter filter, int n) method where 'query' is a regular TermQuery
(the term is the correction) and 'filter' would represent the fq params.
What I'm not sure about is how to convert the fq params from Solr into a
Filter object and whether this is something we need to build ourselves or
if there's an existing API for this.

Also, I'm new to this code so not sure if I'm approaching this the wrong
way. Any advice/pointers are much appreciated.

Thanks,
Nalini



On Thu, Dec 27, 2012 at 12:53 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 I think the answer is yes, that there's a better way to doing all of this.
  But I'm not yet sure what this all entails in your situation.  What are
 you overriding with the Lucene searches?   I imagine Solr has the
 flexibility to handle what you're trying to do without overriding anything
 core in SolrIndexSearcher.

 Generally, the way to get a custom filter in place is to create a custom
 query parser and use that for your fq parameter, like fq={!myparser
 param1='some value'}possible+expression+if+needed, so maybe that helps?

 Tell us more about what you're doing specifically, and maybe we can guide
 you to a more elegant way to plug in any custom logic you want.

 Erik

 On Dec 26, 2012, at 11:21 , Nalini Kartha wrote:

  Hi,
 
  I'm trying to figure out how to convert the fq params that are being
 passed
  to Solr into something that can be used to filter the results of a query
  that's being issued against the Lucene IndexSearcher (I'm modifying some
  Lucene code to issue the query so calling through to one of the
  SolrIndexSearcher methods would be ugly).
 
  Looks like one of the IndexSearcher.search(Query query, Filter filter,
 ...)
  methods would do what I want but I'm wondering if there's any easy way of
  converting the fq params into a Filter? Or is there a better way of doing
  all of this?
 
  Thanks,
  Nalini




Re: Converting fq params to Filter object

2012-12-27 Thread Erik Hatcher
Apologies for misunderstanding.  

Does what you're trying to do already work this way using the 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxCollationTries 
maxCollationTries feature of the spellcheck component?

It looks like it passes through the fq's even, so that the hit count that the 
extended results is inclusive of the filters.

Maybe I'm missing something though, sorry.

Erik

On Dec 27, 2012, at 14:09 , Nalini Kartha wrote:

 Hi Eric,
 
 Sorry, I think I wasn't very clear in explaining what we need to do.
 
 We don't really need to do any complicated overriding, just want to change
 the DirectSpellChecker to issue a query for every correction it finds *with
 fq params from the original query taken into account* so that we can check
 if the correction would actually result in some hits.
 
 I was thinking of implementing this using the IndexSearcher.search(Query
 query, Filter filter, int n) method where 'query' is a regular TermQuery
 (the term is the correction) and 'filter' would represent the fq params.
 What I'm not sure about is how to convert the fq params from Solr into a
 Filter object and whether this is something we need to build ourselves or
 if there's an existing API for this.
 
 Also, I'm new to this code so not sure if I'm approaching this the wrong
 way. Any advice/pointers are much appreciated.
 
 Thanks,
 Nalini
 
 
 
 On Thu, Dec 27, 2012 at 12:53 PM, Erik Hatcher erik.hatc...@gmail.comwrote:
 
 I think the answer is yes, that there's a better way to doing all of this.
 But I'm not yet sure what this all entails in your situation.  What are
 you overriding with the Lucene searches?   I imagine Solr has the
 flexibility to handle what you're trying to do without overriding anything
 core in SolrIndexSearcher.
 
 Generally, the way to get a custom filter in place is to create a custom
 query parser and use that for your fq parameter, like fq={!myparser
 param1='some value'}possible+expression+if+needed, so maybe that helps?
 
 Tell us more about what you're doing specifically, and maybe we can guide
 you to a more elegant way to plug in any custom logic you want.
 
Erik
 
 On Dec 26, 2012, at 11:21 , Nalini Kartha wrote:
 
 Hi,
 
 I'm trying to figure out how to convert the fq params that are being
 passed
 to Solr into something that can be used to filter the results of a query
 that's being issued against the Lucene IndexSearcher (I'm modifying some
 Lucene code to issue the query so calling through to one of the
 SolrIndexSearcher methods would be ugly).
 
 Looks like one of the IndexSearcher.search(Query query, Filter filter,
 ...)
 methods would do what I want but I'm wondering if there's any easy way of
 converting the fq params into a Filter? Or is there a better way of doing
 all of this?
 
 Thanks,
 Nalini
 
 



RE: Converting fq params to Filter object

2012-12-27 Thread Dyer, James
Nalini,

You could take the code from SpellCheckCollator#collate and have it issue a 
test query for each word individually instead of for each collation.  This 
would do exactly what you want. See 
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java

If you are concerned this isn't low-level enough and that performance would 
suffer, then see https://issues.apache.org/jira/browse/SOLR-3240 , which has a 
patch that uses a collector that quits after finding one document.  This makes 
each test query faster at the expense of not getting exact hit-counts.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nalini Kartha [mailto:nalinikar...@gmail.com] 
Sent: Thursday, December 27, 2012 1:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Converting fq params to Filter object

Hi Eric,

Sorry, I think I wasn't very clear in explaining what we need to do.

We don't really need to do any complicated overriding, just want to change
the DirectSpellChecker to issue a query for every correction it finds *with
fq params from the original query taken into account* so that we can check
if the correction would actually result in some hits.

I was thinking of implementing this using the IndexSearcher.search(Query
query, Filter filter, int n) method where 'query' is a regular TermQuery
(the term is the correction) and 'filter' would represent the fq params.
What I'm not sure about is how to convert the fq params from Solr into a
Filter object and whether this is something we need to build ourselves or
if there's an existing API for this.

Also, I'm new to this code so not sure if I'm approaching this the wrong
way. Any advice/pointers are much appreciated.

Thanks,
Nalini



On Thu, Dec 27, 2012 at 12:53 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 I think the answer is yes, that there's a better way to doing all of this.
  But I'm not yet sure what this all entails in your situation.  What are
 you overriding with the Lucene searches?   I imagine Solr has the
 flexibility to handle what you're trying to do without overriding anything
 core in SolrIndexSearcher.

 Generally, the way to get a custom filter in place is to create a custom
 query parser and use that for your fq parameter, like fq={!myparser
 param1='some value'}possible+expression+if+needed, so maybe that helps?

 Tell us more about what you're doing specifically, and maybe we can guide
 you to a more elegant way to plug in any custom logic you want.

 Erik

 On Dec 26, 2012, at 11:21 , Nalini Kartha wrote:

  Hi,
 
  I'm trying to figure out how to convert the fq params that are being
 passed
  to Solr into something that can be used to filter the results of a query
  that's being issued against the Lucene IndexSearcher (I'm modifying some
  Lucene code to issue the query so calling through to one of the
  SolrIndexSearcher methods would be ugly).
 
  Looks like one of the IndexSearcher.search(Query query, Filter filter,
 ...)
  methods would do what I want but I'm wondering if there's any easy way of
  converting the fq params into a Filter? Or is there a better way of doing
  all of this?
 
  Thanks,
  Nalini





Re: Converting fq params to Filter object

2012-12-27 Thread Nalini Kartha
Hi James,

Yup, that was what I tried to do initially but it seems like calling
through to those Solr methods from DirectSpellChecker was not a good idea -
am I wrong? And like you mentioned, this seemed like it wasn't low-level
enough.

Eric: Unfortunately the collate functionality does not work for our use
case since the queries we're correcting are default OR. Here's the original
thread about this -

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3ccamqozyftgiwyrbvwsdf0hfz1sznkq9gnbjfdb_obnelsmvr...@mail.gmail.com%3E

Thanks,
Nalini

On Thu, Dec 27, 2012 at 2:46 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 https://issues.apache.org/jira/browse/SOLR-3240


RE: Converting fq params to Filter object

2012-12-27 Thread Dyer, James
Nalini,

Assuming that you're using Solr, the hook into the collate functionality is in 
SpellCheckComponent#addCollationsToResponse .  To do what you want, you would 
have to modify the call to SpellCheckCollator to issue test queries against the 
individual words instead of the collations.

See 
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/handler/component/SpellCheckComponent.java

Of course if you're using Lucene directly and not Solr, then you would want to 
build a series of queries that each query one word with the filters applied.  
DirectSpellChecker#suggestSimilar returns an array of SuggestWord instances 
that contain the individual words you would want to try.  To optimize this, you 
can use the same approach as in SOLR-3240, implementing a Collector that only 
looks for 1 document then quits.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nalini Kartha [mailto:nalinikar...@gmail.com] 
Sent: Thursday, December 27, 2012 2:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Converting fq params to Filter object

Hi James,

Yup, that was what I tried to do initially but it seems like calling
through to those Solr methods from DirectSpellChecker was not a good idea -
am I wrong? And like you mentioned, this seemed like it wasn't low-level
enough.

Eric: Unfortunately the collate functionality does not work for our use
case since the queries we're correcting are default OR. Here's the original
thread about this -

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3ccamqozyftgiwyrbvwsdf0hfz1sznkq9gnbjfdb_obnelsmvr...@mail.gmail.com%3E

Thanks,
Nalini

On Thu, Dec 27, 2012 at 2:46 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 https://issues.apache.org/jira/browse/SOLR-3240



Re: search with spaces

2012-12-27 Thread Jack Krupansky

That's debugQuery=true or debug=query.

-- Jack Krupansky

-Original Message- 
From: Otis Gospodnetic

Sent: Thursday, December 27, 2012 10:56 AM
To: solr-user@lucene.apache.org
Subject: Re: search with spaces

Hi,

Add debugQuery=query to your search requests.  That will point you in the
right direction.

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html



On Thu, Dec 27, 2012 at 3:45 AM, Sangeetha sangeetha...@gmail.com wrote:


Hi,

I have a text field with value O O Jaane Jaane. When i search with 
*q=Jaane

Jaane* it is giving the results. But if i give *q=O O Jaane Jaane* it is
not
working? What could be the reason?

Thanks,
Sangeetha



--
View this message in context:
http://lucene.472066.n3.nabble.com/search-with-spaces-tp4029265.html
Sent from the Solr - User mailing list archive at Nabble.com.





Frequent OOM - (Unknown source in logs).

2012-12-27 Thread shreejay
Hello, 

I am seeing frequent OOMs for the past 2 days on a SolrCloud Cluster
(Solr4.0 with a patch from Solr-2592) setup (3 shards, each shard with 2
instances. Each instance is running CentOS with 30GB memory, 500GB disk
space), with a separate Zoo Keeper ensemble of 3. 

Here is the stacktrace: http://pastebin.com/cV5DxD4N

I also saw there is a Jira issue which looks similar, the difference being,
in the stacktrace I get, I can Not see which process is trying to do a
expandCapacity.
/java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)/

Where as the stacktrace mentioned in this issue
(https://issues.apache.org/jira/browse/SOLR-3881) is 
  /at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)/

Has anyone seen this issue before? Any fixes for this? 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Frequent-OOM-Unknown-source-in-logs-tp4029361.html
Sent from the Solr - User mailing list archive at Nabble.com.


old index not cleaned up on the slave

2012-12-27 Thread Jason
Hi,
I'm using master/slave replication on Solr 4.0.
Replication is successfully run.
But old index not cleaned up.
Is that bug or not? 

My slave index directory is below...

$ ls -l solr_kr/krg01/data/index/
total 23472512
-rw-r--r--. 1 tomcat tomcat563722625 Dec 24 21:48 _15.fdt
-rw-r--r--. 1 tomcat tomcat   4855210 Dec 24 21:48 _15.fdx
-rw-r--r--. 1 tomcat tomcat4155 Dec 24 22:01 _15.fnm
-rw-r--r--. 1 tomcat tomcat  3367203143 Dec 24 22:01 _15_Lucene40_0.frq
-rw-r--r--. 1 tomcat tomcat  6951612380 Dec 24 22:01 _15_Lucene40_0.prx
-rw-r--r--. 1 tomcat tomcat  1096591353 Dec 24 22:01 _15_Lucene40_0.tim
-rw-r--r--. 1 tomcat tomcat 26026916 Dec 24 22:01 _15_Lucene40_0.tip
-rw-r--r--. 1 tomcat tomcat 388 Dec 24 22:01 _15.si
-rw-r--r--. 1 tomcat tomcat  98 Nov 30 13:43 segments_3
-rw-r--r--. 1 tomcat tomcat  99 Dec 24 22:01 segments_4
-rw-r--r--. 1 tomcat tomcat  20 Aug 12 07:21 segments.gen
-rw-r--r--. 1 tomcat tomcat   563742324 Nov 30 13:32 _t.fdt
-rw-r--r--. 1 tomcat tomcat  4855210 Nov 30 13:32 _t.fdx
-rw-r--r--. 1 tomcat tomcat   4155 Nov 30 13:43 _t.fnm
-rw-r--r--. 1 tomcat tomcat 3382846438 Nov 30 13:43 _t_Lucene40_0.frq
-rw-r--r--. 1 tomcat tomcat 6951620034 Nov 30 13:43 _t_Lucene40_0.prx
-rw-r--r--. 1 tomcat tomcat 1096654275 Nov 30 13:43 _t_Lucene40_0.tim
-rw-r--r--. 1 tomcat tomcat26027222 Nov 30 13:43 _t_Lucene40_0.tip
-rw-r--r--. 1 tomcat tomcat379 Nov 30 13:43 _t.si




--
View this message in context: 
http://lucene.472066.n3.nabble.com/old-index-not-cleaned-up-on-the-slave-tp4029370.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: solr + jetty deployment issue

2012-12-27 Thread David Parks
Do you see any errors coming in on the console, stderr?

I start solr this way and redirect the stdout and stderr to log files, when
I have a problem stderr generally has the answer:

java \
-server \
-Djetty.port=8080 \
-Dsolr.solr.home=/opt/solr \
-Dsolr.data.dir=/mnt/solr_data \
-jar /opt/solr/start.jar /opt/solr/logs/stdout.log
2/opt/solr/logs/stderr.log 



-Original Message-
From: Sushrut Bidwai [mailto:bidwai.sush...@gmail.com] 
Sent: Thursday, December 27, 2012 7:40 PM
To: solr-user@lucene.apache.org
Subject: solr + jetty deployment issue

Hi,

I am having trouble with getting solr + jetty to work. I am following all
instructions to the letter from - http://wiki.apache.org/solr/SolrJetty. I
also created a work folder - /opt/solr/work. I am also setting tmpdir to a
new path in /etc/default/jetty . I am confirming the tmpdir is set to the
new path from admin dashboard, under args.

It works like a charm. But when I restart jetty multiple times, after 3/4
such restarts it starts hanging. Admin pages just dont load and my app fails
to acquire a connection with solr.

What I might be missing? Should I be rather looking at my code and see if I
am not committing correctly?

Please let me know if you have faced similar issue in the past and how to
tackle it.

Thank you.

--
Best Regards,
Sushrut



MoreLikeThis only returns 1 result

2012-12-27 Thread David Parks
I'm doing a query like this for MoreLikeThis, sending it a document ID. But
the only result I ever get back is the document ID I sent it. The debug
response is below.

If I read it correctly, it's taking id:1004401713626 as the term (not the
document ID) and only finding it once. But I want it to match the document
with ID 1004401713626 of course. I tried q=id[1004410713626], but that
generates an exception:

Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse
'id:[1004401713626]': Encountered  ] ]  at line 1, column 17.
Was expecting one of:
TO ...
RANGEIN_QUOTED ...
RANGEIN_GOOP ...

This must be easy, but the documentation is minimal.

My Query:
http://107.23.102.164:8080/solr/select/?qt=mltq=id:[1004401713626]rows=10;
mlt.fl=item_name,item_brand,short_description,long_description,catalog_names
,categories,keywords,attributes,facetimemlt.mintf=2mlt.mindf=5mlt.maxqt=1
00mlt.boost=falsedebugQuery=true

response
  lst name=responseHeader
int name=status0/int
int name=QTime1/int
lst name=params
  str name=mlt.mindf5/str
  str name=mlt.fl
item_name,item_brand,short_description,long_description,catalog_names,catego
ries,keywords,attributes,facetime
/str
  str name=mlt.boostfalse/str
  str name=debugQuerytrue/str
  str name=qid:1004401713626/str
  str name=mlt.mintf2/str
  str name=mlt.maxqt100/str
  str name=qtmlt/str
  str name=rows10/str
/lst/lst
result name=response numFound=1 start=0
  doc
long name=facetime0/long
str name=id1004401713626/str
  /doc
/result
lst name=debug
  str name=rawquerystringid:1004401713626/str
  str name=querystringid:1004401713626/str
  str name=parsedqueryid:1004401713626/str
  str name=parsedquery_toStringid:1004401713626/str
  lst name=explain
str name=1004401713626
18.29481 = (MATCH) fieldWeight(id:1004401713626 in 2843152), product of: 1.0
= tf(termFreq(id:1004401713626)=1) 18.29481 = idf(docFreq=1,
maxDocs=64873893) 1.0 = fieldNorm(field=id, doc=2843152)
/str
  /lst



Re: solr + jetty deployment issue

2012-12-27 Thread Sushrut Bidwai
Hi David,

From what I see in the log and threaddump it seems that getSearcher method
in SolrCore is not able to acquire required lock and because of that its
blocking startup of the server. Here is threaddump -
http://pastebin.com/GPnAzF1q .

On Fri, Dec 28, 2012 at 8:01 AM, David Parks davidpark...@yahoo.com wrote:

 Do you see any errors coming in on the console, stderr?

 I start solr this way and redirect the stdout and stderr to log files, when
 I have a problem stderr generally has the answer:

 java \
 -server \
 -Djetty.port=8080 \
 -Dsolr.solr.home=/opt/solr \
 -Dsolr.data.dir=/mnt/solr_data \
 -jar /opt/solr/start.jar /opt/solr/logs/stdout.log
 2/opt/solr/logs/stderr.log 



 -Original Message-
 From: Sushrut Bidwai [mailto:bidwai.sush...@gmail.com]
 Sent: Thursday, December 27, 2012 7:40 PM
 To: solr-user@lucene.apache.org
 Subject: solr + jetty deployment issue

 Hi,

 I am having trouble with getting solr + jetty to work. I am following all
 instructions to the letter from - http://wiki.apache.org/solr/SolrJetty. I
 also created a work folder - /opt/solr/work. I am also setting tmpdir to a
 new path in /etc/default/jetty . I am confirming the tmpdir is set to the
 new path from admin dashboard, under args.

 It works like a charm. But when I restart jetty multiple times, after 3/4
 such restarts it starts hanging. Admin pages just dont load and my app
 fails
 to acquire a connection with solr.

 What I might be missing? Should I be rather looking at my code and see if I
 am not committing correctly?

 Please let me know if you have faced similar issue in the past and how to
 tackle it.

 Thank you.

 --
 Best Regards,
 Sushrut




-- 
Best Regards,
Sushrut
http://sushrutbidwai.com


Re: MoreLikeThis only returns 1 result

2012-12-27 Thread Jack Krupansky
Sounds like it is simply dispatching to the normal search request handler. 
Although you specified qt=mlt, make sure you enable the legacy select 
handler dispatching in solrconfig.xml.


Change:

   requestDispatcher handleSelect=false 

to

   requestDispatcher handleSelect=true 

Or, simply address the MLT handler directly:

   http://107.23.102.164:8080/solr/mlt?q=...

Or, use the MoreLikeThis search component:

   http://localhost:8983/solr/select?q=...mlt=true;...

See:
http://wiki.apache.org/solr/MoreLikeThis

-- Jack Krupansky

-Original Message- 
From: David Parks

Sent: Thursday, December 27, 2012 9:59 PM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis only returns 1 result

I'm doing a query like this for MoreLikeThis, sending it a document ID. But
the only result I ever get back is the document ID I sent it. The debug
response is below.

If I read it correctly, it's taking id:1004401713626 as the term (not the
document ID) and only finding it once. But I want it to match the document
with ID 1004401713626 of course. I tried q=id[1004410713626], but that
generates an exception:

Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse
'id:[1004401713626]': Encountered  ] ]  at line 1, column 17.
Was expecting one of:
   TO ...
   RANGEIN_QUOTED ...
   RANGEIN_GOOP ...

This must be easy, but the documentation is minimal.

My Query:
http://107.23.102.164:8080/solr/select/?qt=mltq=id:[1004401713626]rows=10;
mlt.fl=item_name,item_brand,short_description,long_description,catalog_names
,categories,keywords,attributes,facetimemlt.mintf=2mlt.mindf=5mlt.maxqt=1
00mlt.boost=falsedebugQuery=true

response
 lst name=responseHeader
   int name=status0/int
   int name=QTime1/int
   lst name=params
 str name=mlt.mindf5/str
 str name=mlt.fl
item_name,item_brand,short_description,long_description,catalog_names,catego
ries,keywords,attributes,facetime
/str
 str name=mlt.boostfalse/str
 str name=debugQuerytrue/str
 str name=qid:1004401713626/str
 str name=mlt.mintf2/str
 str name=mlt.maxqt100/str
 str name=qtmlt/str
 str name=rows10/str
   /lst/lst
   result name=response numFound=1 start=0
 doc
   long name=facetime0/long
   str name=id1004401713626/str
 /doc
   /result
   lst name=debug
 str name=rawquerystringid:1004401713626/str
 str name=querystringid:1004401713626/str
 str name=parsedqueryid:1004401713626/str
 str name=parsedquery_toStringid:1004401713626/str
 lst name=explain
   str name=1004401713626
18.29481 = (MATCH) fieldWeight(id:1004401713626 in 2843152), product of: 1.0
= tf(termFreq(id:1004401713626)=1) 18.29481 = idf(docFreq=1,
maxDocs=64873893) 1.0 = fieldNorm(field=id, doc=2843152)
/str
 /lst 



RE: MoreLikeThis only returns 1 result

2012-12-27 Thread David Parks
Ok, that worked, I had the /mlt request handler misconfigured (forgot a
'/'). It's working now. Thanks!

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Friday, December 28, 2012 11:38 AM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis only returns 1 result

Sounds like it is simply dispatching to the normal search request handler. 
Although you specified qt=mlt, make sure you enable the legacy select
handler dispatching in solrconfig.xml.

Change:

requestDispatcher handleSelect=false 

to

requestDispatcher handleSelect=true 

Or, simply address the MLT handler directly:

http://107.23.102.164:8080/solr/mlt?q=...

Or, use the MoreLikeThis search component:

http://localhost:8983/solr/select?q=...mlt=true;...

See:
http://wiki.apache.org/solr/MoreLikeThis

-- Jack Krupansky

-Original Message-
From: David Parks
Sent: Thursday, December 27, 2012 9:59 PM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis only returns 1 result

I'm doing a query like this for MoreLikeThis, sending it a document ID. But
the only result I ever get back is the document ID I sent it. The debug
response is below.

If I read it correctly, it's taking id:1004401713626 as the term (not the
document ID) and only finding it once. But I want it to match the document
with ID 1004401713626 of course. I tried q=id[1004410713626], but that
generates an exception:

Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse
'id:[1004401713626]': Encountered  ] ]  at line 1, column 17.
Was expecting one of:
TO ...
RANGEIN_QUOTED ...
RANGEIN_GOOP ...

This must be easy, but the documentation is minimal.

My Query:
http://107.23.102.164:8080/solr/select/?qt=mltq=id:[1004401713626]rows=10;
mlt.fl=item_name,item_brand,short_description,long_description,catalog_names
,categories,keywords,attributes,facetimemlt.mintf=2mlt.mindf=5mlt.maxqt=1
00mlt.boost=falsedebugQuery=true

response
  lst name=responseHeader
int name=status0/int
int name=QTime1/int
lst name=params
  str name=mlt.mindf5/str
  str name=mlt.fl
item_name,item_brand,short_description,long_description,catalog_names,catego
ries,keywords,attributes,facetime
/str
  str name=mlt.boostfalse/str
  str name=debugQuerytrue/str
  str name=qid:1004401713626/str
  str name=mlt.mintf2/str
  str name=mlt.maxqt100/str
  str name=qtmlt/str
  str name=rows10/str
/lst/lst
result name=response numFound=1 start=0
  doc
long name=facetime0/long
str name=id1004401713626/str
  /doc
/result
lst name=debug
  str name=rawquerystringid:1004401713626/str
  str name=querystringid:1004401713626/str
  str name=parsedqueryid:1004401713626/str
  str name=parsedquery_toStringid:1004401713626/str
  lst name=explain
str name=1004401713626
18.29481 = (MATCH) fieldWeight(id:1004401713626 in 2843152), product of: 1.0
= tf(termFreq(id:1004401713626)=1) 18.29481 = idf(docFreq=1,
maxDocs=64873893) 1.0 = fieldNorm(field=id, doc=2843152) /str
  /lst 



RE: MoreLikeThis supporting multiple document IDs as input?

2012-12-27 Thread David Parks
I'm somewhat new to Solr (it's running, I've been through the books, but I'm
no master). What I hear you say is that MLT *can* accept, say 5, documents
and provide results, but the results would essentially be the same as
running the query 5 times for each document?

If that's the case, I might accept it. I would just have to merge them
together at the end (perhaps I'd take the top 2 of each result, for
example).

Being somewhat new I'm a little confused by the difference between a Search
Component and a Handler. I've got the /mlt handler working and I'm using
that. But how's that different from a Search Component? Is that referring
to the default /solr/select?q=... style query?

And if what I said about multiple documents above is correct, what's the
syntax to try that out?

Thanks very much for the great help!
Dave


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Wednesday, December 26, 2012 12:07 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis supporting multiple document IDs as input?

MLT has both a request handler and a search component.

The MLT handler returns similar documents only for the first document that
the query matches.

The MLT search component returns similar documents for each of the documents
in the search results, but processes each search result base document one at
a time and keeps its similar documents segregated by each of the base
documents.

It sounds like you wanted to merge the base search results and then find
documents similar to that merged super-document. Is that what you were
really seeking, as opposed to what the MLT component does? Unfortunately,
you can't do that with the components as they are.

You would have to manually merge the values from the base documents and then
you could POST that text back to the MLT handler and find similar documents
using the posted text rather than a query. Kind of messy, but in theory that
should work.

-- Jack Krupansky

-Original Message-
From: David Parks
Sent: Tuesday, December 25, 2012 5:04 AM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis supporting multiple document IDs as input?

I'm unclear on this point from the documentation. Is it possible to give
Solr X # of document IDs and tell it that I want documents similar to those
X documents?

Example:

  - The user is browsing 5 different articles
  - I send Solr the IDs of these 5 articles so I can present the user other
similar articles

I see this example for sending it 1 document ID:
http://localhost:8080/solr/select/?qt=mltq=id:[document
id]mlt.fl=[field1],[field2],[field3]fl=idrows=10

But can I send it 2+ document IDs as the query? 



Re: solr + jetty deployment issue

2012-12-27 Thread Sushrut Bidwai
Here is latest threaddump taken after setting up latest nightly build
version - apache-solr-4.1-2012-12-27_04-32-37 - http://pastebin.com/eum7CxX4

Kind of stuck with this from few days now, so can use little help.

Here is more details on the issue -
1. Setting up jetty + solr using instructions -
http://wiki.apache.org/solr/SolrJetty
2. Initial install with clean data dirs goes smoothly.
3. I can connect to server and index 10K+ documents with out any issues. I
use 10 threads in my app to do so. Not experiencing any
concurrency/deadlock issues.
4. When stop my app and then restart jetty, after few restarts - I get
above mentioned threaddump and startup of server stays blocked forever.
5. If I delete data dir and start again, problem goes away. But reappears
on server restarts.

On Fri, Dec 28, 2012 at 9:03 AM, Sushrut Bidwai bidwai.sush...@gmail.comwrote:

 Hi David,

 From what I see in the log and threaddump it seems that getSearcher method
 in SolrCore is not able to acquire required lock and because of that its
 blocking startup of the server. Here is threaddump -
 http://pastebin.com/GPnAzF1q .


 On Fri, Dec 28, 2012 at 8:01 AM, David Parks davidpark...@yahoo.comwrote:

 Do you see any errors coming in on the console, stderr?

 I start solr this way and redirect the stdout and stderr to log files,
 when
 I have a problem stderr generally has the answer:

 java \
 -server \
 -Djetty.port=8080 \
 -Dsolr.solr.home=/opt/solr \
 -Dsolr.data.dir=/mnt/solr_data \
 -jar /opt/solr/start.jar /opt/solr/logs/stdout.log
 2/opt/solr/logs/stderr.log 



 -Original Message-
 From: Sushrut Bidwai [mailto:bidwai.sush...@gmail.com]
 Sent: Thursday, December 27, 2012 7:40 PM
 To: solr-user@lucene.apache.org
 Subject: solr + jetty deployment issue

 Hi,

 I am having trouble with getting solr + jetty to work. I am following all
 instructions to the letter from - http://wiki.apache.org/solr/SolrJetty.
 I
 also created a work folder - /opt/solr/work. I am also setting tmpdir to a
 new path in /etc/default/jetty . I am confirming the tmpdir is set to the
 new path from admin dashboard, under args.

 It works like a charm. But when I restart jetty multiple times, after 3/4
 such restarts it starts hanging. Admin pages just dont load and my app
 fails
 to acquire a connection with solr.

 What I might be missing? Should I be rather looking at my code and see if
 I
 am not committing correctly?

 Please let me know if you have faced similar issue in the past and how to
 tackle it.

 Thank you.

 --
 Best Regards,
 Sushrut




 --
 Best Regards,
 Sushrut
 http://sushrutbidwai.com




-- 
Best Regards,
Sushrut
http://sushrutbidwai.com


RE: MoreLikeThis supporting multiple document IDs as input?

2012-12-27 Thread Otis Gospodnetic
Hi Dave,

Think of search components as a chain of Java classes that get executed
during each search request. If you open solrconfig.xml you will see how
they are defined and used.

HTH

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, David Parks davidpark...@yahoo.com wrote:

 I'm somewhat new to Solr (it's running, I've been through the books, but
 I'm
 no master). What I hear you say is that MLT *can* accept, say 5, documents
 and provide results, but the results would essentially be the same as
 running the query 5 times for each document?

 If that's the case, I might accept it. I would just have to merge them
 together at the end (perhaps I'd take the top 2 of each result, for
 example).

 Being somewhat new I'm a little confused by the difference between a
 Search
 Component and a Handler. I've got the /mlt handler working and I'm using
 that. But how's that different from a Search Component? Is that referring
 to the default /solr/select?q=... style query?

 And if what I said about multiple documents above is correct, what's the
 syntax to try that out?

 Thanks very much for the great help!
 Dave


 -Original Message-
 From: Jack Krupansky [mailto:j...@basetechnology.com]
 Sent: Wednesday, December 26, 2012 12:07 PM
 To: solr-user@lucene.apache.org
 Subject: Re: MoreLikeThis supporting multiple document IDs as input?

 MLT has both a request handler and a search component.

 The MLT handler returns similar documents only for the first document that
 the query matches.

 The MLT search component returns similar documents for each of the
 documents
 in the search results, but processes each search result base document one
 at
 a time and keeps its similar documents segregated by each of the base
 documents.

 It sounds like you wanted to merge the base search results and then find
 documents similar to that merged super-document. Is that what you were
 really seeking, as opposed to what the MLT component does? Unfortunately,
 you can't do that with the components as they are.

 You would have to manually merge the values from the base documents and
 then
 you could POST that text back to the MLT handler and find similar documents
 using the posted text rather than a query. Kind of messy, but in theory
 that
 should work.

 -- Jack Krupansky

 -Original Message-
 From: David Parks
 Sent: Tuesday, December 25, 2012 5:04 AM
 To: solr-user@lucene.apache.org
 Subject: MoreLikeThis supporting multiple document IDs as input?

 I'm unclear on this point from the documentation. Is it possible to give
 Solr X # of document IDs and tell it that I want documents similar to those
 X documents?

 Example:

   - The user is browsing 5 different articles
   - I send Solr the IDs of these 5 articles so I can present the user other
 similar articles

 I see this example for sending it 1 document ID:
 http://localhost:8080/solr/select/?qt=mltq=id:[document
 id]mlt.fl=[field1],[field2],[field3]fl=idrows=10

 But can I send it 2+ document IDs as the query?




Re: solr + jetty deployment issue

2012-12-27 Thread Sushrut Bidwai
If I comment out the /browse requesthandler from solrconfig.xml, problem
goes away. So issue is definitely with the way I am configuring
solrconfig.xml. I will debug into on my side.

On Fri, Dec 28, 2012 at 11:55 AM, Sushrut Bidwai
bidwai.sush...@gmail.comwrote:

 Here is latest threaddump taken after setting up latest nightly build
 version - apache-solr-4.1-2012-12-27_04-32-37 -
 http://pastebin.com/eum7CxX4

 Kind of stuck with this from few days now, so can use little help.

 Here is more details on the issue -
 1. Setting up jetty + solr using instructions -
 http://wiki.apache.org/solr/SolrJetty
 2. Initial install with clean data dirs goes smoothly.
 3. I can connect to server and index 10K+ documents with out any issues. I
 use 10 threads in my app to do so. Not experiencing any
 concurrency/deadlock issues.
 4. When stop my app and then restart jetty, after few restarts - I get
 above mentioned threaddump and startup of server stays blocked forever.
 5. If I delete data dir and start again, problem goes away. But reappears
 on server restarts.


 On Fri, Dec 28, 2012 at 9:03 AM, Sushrut Bidwai 
 bidwai.sush...@gmail.comwrote:

 Hi David,

 From what I see in the log and threaddump it seems that getSearcher
 method in SolrCore is not able to acquire required lock and because of that
 its blocking startup of the server. Here is threaddump -
 http://pastebin.com/GPnAzF1q .


 On Fri, Dec 28, 2012 at 8:01 AM, David Parks davidpark...@yahoo.comwrote:

 Do you see any errors coming in on the console, stderr?

 I start solr this way and redirect the stdout and stderr to log files,
 when
 I have a problem stderr generally has the answer:

 java \
 -server \
 -Djetty.port=8080 \
 -Dsolr.solr.home=/opt/solr \
 -Dsolr.data.dir=/mnt/solr_data \
 -jar /opt/solr/start.jar /opt/solr/logs/stdout.log
 2/opt/solr/logs/stderr.log 



 -Original Message-
 From: Sushrut Bidwai [mailto:bidwai.sush...@gmail.com]
 Sent: Thursday, December 27, 2012 7:40 PM
 To: solr-user@lucene.apache.org
 Subject: solr + jetty deployment issue

 Hi,

 I am having trouble with getting solr + jetty to work. I am following all
 instructions to the letter from - http://wiki.apache.org/solr/SolrJetty.
 I
 also created a work folder - /opt/solr/work. I am also setting tmpdir to
 a
 new path in /etc/default/jetty . I am confirming the tmpdir is set to the
 new path from admin dashboard, under args.

 It works like a charm. But when I restart jetty multiple times, after 3/4
 such restarts it starts hanging. Admin pages just dont load and my app
 fails
 to acquire a connection with solr.

 What I might be missing? Should I be rather looking at my code and see
 if I
 am not committing correctly?

 Please let me know if you have faced similar issue in the past and how to
 tackle it.

 Thank you.

 --
 Best Regards,
 Sushrut




 --
 Best Regards,
 Sushrut
 http://sushrutbidwai.com




 --
 Best Regards,
 Sushrut
 http://sushrutbidwai.com




-- 
Best Regards,
Sushrut
http://sushrutbidwai.com


RE: MoreLikeThis supporting multiple document IDs as input?

2012-12-27 Thread David Parks
So the Search Components are executed in series an _every_ request. I
presume then that they look at the request parameters and decide what and
whether to take action.

So in the case of the MLT component this was said:

 The MLT search component returns similar documents for each of the 
 documents in the search results, but processes each search result base 
 document one at a time and keeps its similar documents segregated by 
 each of the base documents.

So what I think I understand is that the Query Component (presumably this
guy: org.apache.solr.handler.component.QueryComponent) takes the input from
the q parameter and returns a result (the q=id:123456 ensure that the
Query Component will return just this one document).

The MltComponent then looks at the result from the QueryComponent and
generates its results.

The part that is still confusing is understanding the difference between
these two comments:

 - The MLT search component returns similar documents for each of the
documents in the search results
 - The MLT handler returns similar documents only for the first document
that the query matches.



-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Friday, December 28, 2012 1:26 PM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

Hi Dave,

Think of search components as a chain of Java classes that get executed
during each search request. If you open solrconfig.xml you will see how they
are defined and used.

HTH

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, David Parks davidpark...@yahoo.com wrote:

 I'm somewhat new to Solr (it's running, I've been through the books, 
 but I'm no master). What I hear you say is that MLT *can* accept, say 
 5, documents and provide results, but the results would essentially be 
 the same as running the query 5 times for each document?

 If that's the case, I might accept it. I would just have to merge them 
 together at the end (perhaps I'd take the top 2 of each result, for 
 example).

 Being somewhat new I'm a little confused by the difference between a 
 Search Component and a Handler. I've got the /mlt handler working 
 and I'm using that. But how's that different from a Search 
 Component? Is that referring to the default /solr/select?q=... 
 style query?

 And if what I said about multiple documents above is correct, what's 
 the syntax to try that out?

 Thanks very much for the great help!
 Dave


 -Original Message-
 From: Jack Krupansky [mailto:j...@basetechnology.com]
 Sent: Wednesday, December 26, 2012 12:07 PM
 To: solr-user@lucene.apache.org
 Subject: Re: MoreLikeThis supporting multiple document IDs as input?

 MLT has both a request handler and a search component.

 The MLT handler returns similar documents only for the first document 
 that the query matches.

 The MLT search component returns similar documents for each of the 
 documents in the search results, but processes each search result base 
 document one at a time and keeps its similar documents segregated by 
 each of the base documents.

 It sounds like you wanted to merge the base search results and then 
 find documents similar to that merged super-document. Is that what you 
 were really seeking, as opposed to what the MLT component does? 
 Unfortunately, you can't do that with the components as they are.

 You would have to manually merge the values from the base documents 
 and then you could POST that text back to the MLT handler and find 
 similar documents using the posted text rather than a query. Kind of 
 messy, but in theory that should work.

 -- Jack Krupansky

 -Original Message-
 From: David Parks
 Sent: Tuesday, December 25, 2012 5:04 AM
 To: solr-user@lucene.apache.org
 Subject: MoreLikeThis supporting multiple document IDs as input?

 I'm unclear on this point from the documentation. Is it possible to 
 give Solr X # of document IDs and tell it that I want documents 
 similar to those X documents?

 Example:

   - The user is browsing 5 different articles
   - I send Solr the IDs of these 5 articles so I can present the user 
 other similar articles

 I see this example for sending it 1 document ID:
 http://localhost:8080/solr/select/?qt=mltq=id:[document
 id]mlt.fl=[field1],[field2],[field3]fl=idrows=10

 But can I send it 2+ document IDs as the query?