Re: Solr 4.2.1 Branch

2013-04-05 Thread Jagdish Nomula
That works out. Thanks for shooting the link.

On Fri, Apr 5, 2013 at 5:51 PM, Jack Krupansky wrote:

> You want the "tagged" branch:
>
> https://github.com/apache/**lucene-solr/tree/lucene_solr_**4_2_1
>
>
> -- Jack Krupansky
>
> -Original Message- From: Jagdish Nomula Sent: Friday, April 05,
> 2013 8:36 PM To: solr-user@lucene.apache.org Subject: Solr 4.2.1 Branch
> Hello,
>
> I was trying to get hold of solr 4.2.1 branch on github. I see
> https://github.com/apache/**lucene-solr/tree/lucene_solr_**4_2.
>  I don't see
> any branch for 4.2.1. Am i missing anything ?.
>
> Thanks in advance for your help.
>
> --
> ***Jagdish Nomula*
>
> Sr. Manager Search
> Simply Hired, Inc.
> 370 San Aleso Ave., Ste 200
> Sunnyvale, CA 94085
>
> office - 408.400.4700
> cell - 408.431.2916
> email - jagd...@simplyhired.com 
>
> www.simplyhired.com
>



-- 
***Jagdish Nomula*
Sr. Manager Search
Simply Hired, Inc.
370 San Aleso Ave., Ste 200
Sunnyvale, CA 94085

office - 408.400.4700
cell - 408.431.2916
email - jagd...@simplyhired.com 

www.simplyhired.com


Re: Solr 4.2.1 Branch

2013-04-05 Thread Jack Krupansky

You want the "tagged" branch:

https://github.com/apache/lucene-solr/tree/lucene_solr_4_2_1

-- Jack Krupansky

-Original Message- 
From: Jagdish Nomula 
Sent: Friday, April 05, 2013 8:36 PM 
To: solr-user@lucene.apache.org 
Subject: Solr 4.2.1 Branch 


Hello,

I was trying to get hold of solr 4.2.1 branch on github. I see
https://github.com/apache/lucene-solr/tree/lucene_solr_4_2.  I don't see
any branch for 4.2.1. Am i missing anything ?.

Thanks in advance for your help.

--
***Jagdish Nomula*
Sr. Manager Search
Simply Hired, Inc.
370 San Aleso Ave., Ste 200
Sunnyvale, CA 94085

office - 408.400.4700
cell - 408.431.2916
email - jagd...@simplyhired.com 

www.simplyhired.com


Re: Solr 4.2.1 Branch

2013-04-05 Thread Jack Krupansky

You want the "tagged" branch:
http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_2_1/

-- Jack Krupansky

-Original Message- 
From: Jagdish Nomula 
Sent: Friday, April 05, 2013 8:36 PM 
To: solr-user@lucene.apache.org 
Subject: Solr 4.2.1 Branch 


Hello,

I was trying to get hold of solr 4.2.1 branch on github. I see
https://github.com/apache/lucene-solr/tree/lucene_solr_4_2.  I don't see
any branch for 4.2.1. Am i missing anything ?.

Thanks in advance for your help.

--
***Jagdish Nomula*
Sr. Manager Search
Simply Hired, Inc.
370 San Aleso Ave., Ste 200
Sunnyvale, CA 94085

office - 408.400.4700
cell - 408.431.2916
email - jagd...@simplyhired.com 

www.simplyhired.com


Solr 4.2.1 Branch

2013-04-05 Thread Jagdish Nomula
Hello,

I was trying to get hold of solr 4.2.1 branch on github. I see
https://github.com/apache/lucene-solr/tree/lucene_solr_4_2.  I don't see
any branch for 4.2.1. Am i missing anything ?.

Thanks in advance for your help.

-- 
***Jagdish Nomula*
Sr. Manager Search
Simply Hired, Inc.
370 San Aleso Ave., Ste 200
Sunnyvale, CA 94085

office - 408.400.4700
cell - 408.431.2916
email - jagd...@simplyhired.com 

www.simplyhired.com


RE: Solr Multiword Search

2013-04-05 Thread Dyer, James
To get "did-you-mean" suggestions, use both "spellcheck.alternativeTermCount" > 
0 along with "spellcheck.maxResultsForSuggest" > 0.  Set this later parameter 
to the max # of hits you want to trigger "did-you-mean" suggestions.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount 
for both parameters.  Also, you might find the discussion from when this was 
developed helpful:  https://issues.apache.org/jira/browse/SOLR-2585 . 

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: skmirch [mailto:skmi...@hotmail.com] 
Sent: Friday, April 05, 2013 2:21 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr Multiword Search

Hi James,
Thanks for the very useful tips, however, I am looking for searches that
produce collations.

I need a functionality where someone searching for "madona" sees results for
"madona" and also get collations for "madonna".  So a functionality like
"Did you mean" can be provided.   We need exact matches and provide
suggestions if better ones exist from within our catalog?

What I am seeing right now is that when searching for "madona", "madona" is
returned but there are no collations for "madonna" appearing.  I am using
DirectSolrSpellChecker and have minQueryFrequency set at 0.01 . In theory it
should produce some collations for madonna.

I am not seeing any.
Not sure what I need to do for this?  I would appreciate any help.
Thanks.
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Multiword-Search-tp4053038p4054130.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Please add me: FuadEfendi

2013-04-05 Thread Steve Rowe
On Apr 5, 2013, at 4:34 PM, Fuad Efendi  wrote:
> Few months ago I was able to modify Wiki; I can't do it now, probably
> because http://wiki.apache.org/solr/ContributorsGroup
> 
> Please add me: FuadEfendi

Added to solr wiki ContributorsGroup.


contributor group

2013-04-05 Thread Fuad Efendi
Hi,

Please add me: FuadEfendi

Thanks!




-- 
http://www.tokenizer.ca






Please add me: FuadEfendi

2013-04-05 Thread Fuad Efendi
Hi,

Few months ago I was able to modify Wiki; I can't do it now, probably
because http://wiki.apache.org/solr/ContributorsGroup
 
Please add me: FuadEfendi


Thanks!


-- 
Fuad Efendi, PhD, CEO
C: (416)993-2060
F: (416)800-6479
Tokenizer Inc., Canada
http://www.tokenizer.ca






Re: Solr 4.2 - Unexpected behaviour when updating a document with only id field specified in the update

2013-04-05 Thread Curtis Beattie
Thanks Jack & Shawn.

As per Jack's comment, if I add update="set" to my "id" field, solr
does not remove/replace the document:
curl http://localhost:1/solr/simple-collection/update?commit=true
-H "Content-Type: text/xml" -d '



doc1

'

http://localhost:1/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true


0
12

true
*:*
xml




doc1
Document 1

A

1431508208721068032




As you can see in this example there was no change, which is exactly
what Jack is saying. Although only specifying the "id" in an update is
a little pathological if one wanted to be extra cautious they could
prevent solr from deleting an existing document they could add
update="set" to their "id" field.

I discovered this behaviour because an ORM tool I am using was
incorrectly issuing a Solr update when none of the fields were
modified and thus only the "id" field was sent in the update request.
But the result was unexpected in that we were losing the contents of
documents in our Solr core.

I have to agree with Shawn that there would be value in having an
 XML element as this is more intent revealing. The current
behaviour is really add_or_update: what if I really don't want
add_or_update semantics and just update_or_fail?

Anyway, thanks for the help gentlemen.

On Fri, Apr 5, 2013 at 4:08 PM, Jack Krupansky  wrote:
> Since you don't have any "update" attribute specified, you are doing a
> simple "add" - which deletes the old document with that key and replaces it
> with the data from the "add" document.
>
> Again: It is the presence of the "update" that turns the document  into
> an "update", otherwise  simply replaces any existing document or adds a
> new document.
>
> -- Jack Krupansky
>
> -Original Message- From: Curtis Beattie
> Sent: Friday, April 05, 2013 2:52 PM
> To: solr-user@lucene.apache.org
> Subject: Solr 4.2 - Unexpected behaviour when updating a document with only
> id field specified in the update
>
>
> I am experiencing some peculiar behavior when updating a document. I'm
> curious whether this is "working as intended" or whether it is a
> defect. Allow me to articulate the problem using an example (should be
> easily reproducable with the "example" configuration data).
>
> The workflow is as follows:
>
> 1) Create a document with fields: id, name_s and keywords_ss (works as
> expected).
> 2) Update the document by specifying id and replacing keywords_ss
> (works as expected).
> 3) Update the document by only specifying id (unusual behavior:
> document is "wiped")
>
>
> Step #1 - Create the document
> curl http://localhost:1/solr/simple-collection/update?commit=true
> -H "Content-Type: text/xml" -d '
> 
> 
>
>doc1
>Document 1
>A
>
> '
>
> http://localhost:1/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true
> 
> 
> 0
> 14
> 
> true
> *:*
> xml
> 
> 
> 
> 
> doc1
> Document 1
> 
> A
> 
> 1431502565339561984
> 
> 
> 
>
> Step #2 - Update the document specifying id & keywords_ss
> curl http://localhost:1/solr/simple-collection/update?commit=true
> -H "Content-Type: text/xml" -d '
> 
> 
>
>doc1
>B
>
> '
>
> http://localhost:1/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true
> 
> 
> 0
> 13
> 
> true
> *:*
> xml
> 
> 
> 
> 
> doc1
> Document 1
> 
> B
> 
> 1431502700990693376
> 
> 
> 
>
> Step #3 - Update the document specifying only 'id'
> curl http://localhost:1/solr/simple-collection/update?commit=true
> -H "Content-Type: text/xml" -d '
> 
> 
>
>doc1
>
> '
>
> http://localhost:1/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true
> 
> 
> 0
> 14
> 
> true
> *:*
> xml
> 
> 
> 
> 
> doc1
> 1431502818264481792
> 
> 
> 
>
> ---
>
> Now I realize that "updating" a document and specifying only the 'id'
> is pointless but the unusual behavior, in my view, is that in this
> circumstance Solr seems to be deleting the 'name_s' field. In fact,
> all fields except 'id' are lost. The unusual behaviour, in my view, is
> that Solr will perform an update when at least one field (other than
> 'id') is specified but when only 'id' is specified it seems to be
> deleting and re-adding the document without preserving the existing
> data.
>
> Can someone please comment on this behaviour and indicate whether or
> not it is in fact correct or if it represents a defect?
>
> Thanks,
> --
> Curt



-- 
Curt


Re: Flow Chart of Solr

2013-04-05 Thread Furkan KAMACI
I have read books and wikis of Solr and Lucene and I had to debug the code
to find which parts comes from other. I will tidy up my notes and share the
pig picture flow and the detailed one. After that I will ask you for your
opinions, thanks.


2013/4/5 Erick Erickson 

> Then there's my lazy method. Fire up the IDE and find a test case that
> looks close to something you want to understand further. Step through
> it all in the debugger. I admit there'll be some fumbling at the start
> to _find_ the test case, but they're pretty well named. In IntelliJ,
> all you have to do is right-click on the test case and the context
> menu says "debug blahbalbhabl" You can chart the class
> relationships you actually wind up in as you go. This seems tedious,
> but it saves me getting lost in the class hierarchy.
>
> Also, there are some convenient tools in the IDE that will show you
> class hierarchies as you need.
>
> Or attach your debugger to a running Solr, which is actually very
> easy. In IntelliJ (and Eclipse has something very similar), create a
> "remote" project. That'll specify some parameters you start up with,
> e.g.:
> java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5900
> -jar start.jar
>
> Now start up the remote debugging session you just created in the IDE
> and you are attached to a live solr instance and able to step through
> any code you want.
>
> Either way, you can make the IDE work for you!
>
> FWIW,
> Erick
>
> On Wed, Apr 3, 2013 at 12:03 PM, Jack Krupansky 
> wrote:
> > We're using the 4.x branch code as the basis for our writing. So,
> > effectively it will be for at least 4.3 when the book comes out in the
> > summer.
> >
> > Early access will be in about a month or so. O'Reilly will be showing a
> > galley proof for 200 pages of the book next week at Big Data TechCon next
> > week in Boston.
> >
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Jack Park
> > Sent: Wednesday, April 03, 2013 12:56 PM
> >
> > To: solr-user@lucene.apache.org
> > Subject: Re: Flow Chart of Solr
> >
> > Jack,
> >
> > Is that new book up to the 4.+ series?
> >
> > Thanks
> > The other Jack
> >
> > On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky 
> > wrote:
> >>
> >> And another one on the way:
> >>
> >>
> http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957
> >>
> >> Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Jack Park
> >> Sent: Wednesday, April 03, 2013 11:25 AM
> >>
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Flow Chart of Solr
> >>
> >> There are three books on Solr, two with that in the title, and one,
> >> Taming Text, each of which have been very valuable in understanding
> >> Solr.
> >>
> >> Jack
> >>
> >> On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky  >
> >> wrote:
> >>>
> >>>
> >>> Sure, yes. But... it comes down to what level of detail you want and
> need
> >>> for a specific task. In other words, there are probably a dozen or more
> >>> levels of detail. The reality is that if you are going to work at the
> >>> Solr
> >>> code level, that is very, very different than being a "user" of Solr,
> and
> >>> at
> >>> that point your first step is to become familiar with the code itself.
> >>>
> >>> When you talk about "parsing" and "stemming", you are really talking
> >>> about
> >>> the user-level, not the Solr code level. Maybe what you really need is
> a
> >>> cheat sheet that maps a user-visible feature to the main Solr code
> >>> component
> >>> for that implements that user feature.
> >>>
> >>> There are a number of different forms of "parsing" in Solr - parsing of
> >>> what? Queries? Requests? Solr documents? Function queries?
> >>>
> >>> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
> >>> that.
> >>> Lucene does all of the "token filtering". Are you asking for details on
> >>> how
> >>> Lucene works? Maybe you meant to ask how "term analysis" works, which
> is
> >>> split between Solr and Lucene. Or maybe you simply wanted to know when
> >>> and
> >>> where term analysis is done. Tell us your specific problem or specific
> >>> question and we can probably quickly give you an answer.
> >>>
> >>> In truth, NOBODY uses "flow charts" anymore. Sure, there are some
> >>> user-level
> >>> diagrams, but not down to the code level.
> >>>
> >>> If you could focus on specific questions, we could give you specific
> >>> answers.
> >>>
> >>> "Main steps"? That depends on what level you are working at. Tell us
> what
> >>> problem you are trying to solve and we can point you to the relevant
> >>> areas.
> >>>
> >>> In truth, if you become generally familiar with Solr at the user level
> >>> (study the wikis), you will already know what the "main steps" are.
> >>>
> >>> So, it is not "main steps of Solr", but main steps of some specific
> >>> "request" of Solr, and for a specified level of detail, and for a

Re: Filtering Search Cloud

2013-04-05 Thread Furkan KAMACI
Ok, I will test and give you a detailed report for it, thanks for your help.


2013/4/5 Erick Erickson 

> I cannot emphasize strongly enough that you need to _prove_ you have
> a problem before you decide on a solution! Do you have any evidence
> that solrcloud can't handle the load you intend? Might a better approach
> be just to create more shards thus spreading the load and get all the
> HA/DR goodness of SolrCloud?
>
> So far you've said you'll have a "heavy" load without giving us any
> numbers.
> 10,000 update/second? 10 updates/second? 1 query/second? 100,000
> queries/second? 100,000 documents? 1,000,000,000,000 documents?
>
> Best
> Erick
>
> On Wed, Apr 3, 2013 at 5:15 PM, Shawn Heisey  wrote:
> > On 4/3/2013 1:52 PM, Furkan KAMACI wrote:
> >> Thanks for your explanation, you explained every thing what I need. Just
> >> one more question. I see that I can not make it with Solr Cloud, but I
> can
> >> do something like that with master-slave replication of Solr. If I use
> >> master-slave replication of Solr, can I eliminate (filter) something
> >> (something that is indexed from master) from being a response after
> >> querying (querying from slaves) ?
> >
> > I don't understand the question.  I will attempt to give you more
> > information, but it might not answer your question.  If not, you'll have
> > to try to improve your question.
> >
> > Your master and each of that master's slaves will have the same index as
> > soon as replication is done.  A query on the slave has no idea that the
> > master exists.
> >
> > Thanks,
> > Shawn
> >
>


Re: Solr 4.2 - Unexpected behaviour when updating a document with only id field specified in the update

2013-04-05 Thread Jack Krupansky
Since you don't have any "update" attribute specified, you are doing a 
simple "add" - which deletes the old document with that key and replaces it 
with the data from the "add" document.


Again: It is the presence of the "update" that turns the document  into 
an "update", otherwise  simply replaces any existing document or adds a 
new document.


-- Jack Krupansky

-Original Message- 
From: Curtis Beattie

Sent: Friday, April 05, 2013 2:52 PM
To: solr-user@lucene.apache.org
Subject: Solr 4.2 - Unexpected behaviour when updating a document with only 
id field specified in the update


I am experiencing some peculiar behavior when updating a document. I'm
curious whether this is "working as intended" or whether it is a
defect. Allow me to articulate the problem using an example (should be
easily reproducable with the "example" configuration data).

The workflow is as follows:

1) Create a document with fields: id, name_s and keywords_ss (works as
expected).
2) Update the document by specifying id and replacing keywords_ss
(works as expected).
3) Update the document by only specifying id (unusual behavior:
document is "wiped")


Step #1 - Create the document
curl http://localhost:1/solr/simple-collection/update?commit=true
-H "Content-Type: text/xml" -d '


   
   doc1
   Document 1
   A
   
'

http://localhost:1/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true


0
14

true
*:*
xml




doc1
Document 1

A

1431502565339561984




Step #2 - Update the document specifying id & keywords_ss
curl http://localhost:1/solr/simple-collection/update?commit=true
-H "Content-Type: text/xml" -d '


   
   doc1
   B
   
'

http://localhost:1/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true


0
13

true
*:*
xml




doc1
Document 1

B

1431502700990693376




Step #3 - Update the document specifying only 'id'
curl http://localhost:1/solr/simple-collection/update?commit=true
-H "Content-Type: text/xml" -d '


   
   doc1
   
'

http://localhost:1/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true


0
14

true
*:*
xml




doc1
1431502818264481792




---

Now I realize that "updating" a document and specifying only the 'id'
is pointless but the unusual behavior, in my view, is that in this
circumstance Solr seems to be deleting the 'name_s' field. In fact,
all fields except 'id' are lost. The unusual behaviour, in my view, is
that Solr will perform an update when at least one field (other than
'id') is specified but when only 'id' is specified it seems to be
deleting and re-adding the document without preserving the existing
data.

Can someone please comment on this behaviour and indicate whether or
not it is in fact correct or if it represents a defect?

Thanks,
--
Curt 



Re: Solr 4.2 - Unexpected behaviour when updating a document with only id field specified in the update

2013-04-05 Thread Shawn Heisey

On 4/5/2013 12:52 PM, Curtis Beattie wrote:

I am experiencing some peculiar behavior when updating a document. I'm
curious whether this is "working as intended" or whether it is a
defect. Allow me to articulate the problem using an example (should be
easily reproducable with the "example" configuration data).

The workflow is as follows:

1) Create a document with fields: id, name_s and keywords_ss (works as
expected).
2) Update the document by specifying id and replacing keywords_ss
(works as expected).
3) Update the document by only specifying id (unusual behavior:
document is "wiped")


When you do your third step, nothing in the XML you are sending says 
"update."  There is no way for Solr to know that you intended it to be a 
fragment specifying a partial document update.  Because it looks like a 
new document with only one field defined, Solr does what it has always 
done - delete the old one and add the new one.  I think it's working as 
intended.


If it can be implemented without major headaches, I think that partial 
document updates should use a different XML tag than "add" ... like 
"update."


Thanks,
Shawn



Re: unknown field error when indexing with nutch

2013-04-05 Thread Jack Krupansky
It could also be a parameter being sent from Nutch. Check the Nutch doc for 
the Nutch-to-Solr interface. Maybe YOU are supposed to add a "host" field to 
your Solr schema.


-- Jack Krupansky

-Original Message- 
From: Amit Sela

Sent: Friday, April 05, 2013 3:46 PM
To: solr-user@lucene.apache.org
Subject: Re: unknown field error when indexing with nutch

I'm using the solrconfig supplied with Sole 4.2 and I added the nutch
request handler. But I keep getting the same errors.
On Apr 5, 2013 8:11 PM, "Jack Krupansky"  wrote:


Check your solrconfig.xml file for references to a "host" field.

But maybe more importantly, make sure you use a Solr 4.1 solrconfig and
merge in any of your application-specific changes.

-- Jack Krupansky

-Original Message- From: Amit Sela
Sent: Friday, April 05, 2013 12:57 PM
To: solr-user@lucene.apache.org
Subject: unknown field error when indexing with nutch

Hi all,

I'm trying to run a nutch crawler and index to Solr.
I'm running Nutch 1.6 and Solr 4.2.

I managed to crawl and index with that Nutch version into Solr 3.6.2 but I
can't seem to manage to run it with Solr 4.2

I re-built Nutch with the schema-solr4.xml and copied that file to
SOLR_HOME/example/solr/**collection1/conf/schema.xml but the job fails
when
trying to index:

SolrException: ERROR: [doc=
http://0movies.com/**watchversion.php?id=3818&link=**1364879137]
unknown field
'host'

It looks like Solr is not aware of the schema... Did I miss something ?

Thanks.





Re: unknown field error when indexing with nutch

2013-04-05 Thread Amit Sela
I'm using the solrconfig supplied with Sole 4.2 and I added the nutch
request handler. But I keep getting the same errors.
 On Apr 5, 2013 8:11 PM, "Jack Krupansky"  wrote:

> Check your solrconfig.xml file for references to a "host" field.
>
> But maybe more importantly, make sure you use a Solr 4.1 solrconfig and
> merge in any of your application-specific changes.
>
> -- Jack Krupansky
>
> -Original Message- From: Amit Sela
> Sent: Friday, April 05, 2013 12:57 PM
> To: solr-user@lucene.apache.org
> Subject: unknown field error when indexing with nutch
>
> Hi all,
>
> I'm trying to run a nutch crawler and index to Solr.
> I'm running Nutch 1.6 and Solr 4.2.
>
> I managed to crawl and index with that Nutch version into Solr 3.6.2 but I
> can't seem to manage to run it with Solr 4.2
>
> I re-built Nutch with the schema-solr4.xml and copied that file to
> SOLR_HOME/example/solr/**collection1/conf/schema.xml but the job fails
> when
> trying to index:
>
> SolrException: ERROR: [doc=
> http://0movies.com/**watchversion.php?id=3818&link=**1364879137]
> unknown field
> 'host'
>
> It looks like Solr is not aware of the schema... Did I miss something ?
>
> Thanks.
>


RE: Solr Multiword Search

2013-04-05 Thread skmirch
Hi James,
Thanks for the very useful tips, however, I am looking for searches that
produce collations.

I need a functionality where someone searching for "madona" sees results for
"madona" and also get collations for "madonna".  So a functionality like
"Did you mean" can be provided.   We need exact matches and provide
suggestions if better ones exist from within our catalog?

What I am seeing right now is that when searching for "madona", "madona" is
returned but there are no collations for "madonna" appearing.  I am using
DirectSolrSpellChecker and have minQueryFrequency set at 0.01 . In theory it
should produce some collations for madonna.

I am not seeing any.
Not sure what I need to do for this?  I would appreciate any help.
Thanks.
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Multiword-Search-tp4053038p4054130.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.2 - Unexpected behaviour when updating a document with only id field specified in the update

2013-04-05 Thread Curtis Beattie
I am experiencing some peculiar behavior when updating a document. I'm
curious whether this is "working as intended" or whether it is a
defect. Allow me to articulate the problem using an example (should be
easily reproducable with the "example" configuration data).

The workflow is as follows:

1) Create a document with fields: id, name_s and keywords_ss (works as
expected).
2) Update the document by specifying id and replacing keywords_ss
(works as expected).
3) Update the document by only specifying id (unusual behavior:
document is "wiped")


Step #1 - Create the document
curl http://localhost:1/solr/simple-collection/update?commit=true
-H "Content-Type: text/xml" -d '



doc1
Document 1
A

'

http://localhost:1/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true


0
14

true
*:*
xml




doc1
Document 1

A

1431502565339561984




Step #2 - Update the document specifying id & keywords_ss
curl http://localhost:1/solr/simple-collection/update?commit=true
-H "Content-Type: text/xml" -d '



doc1
B

'

http://localhost:1/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true


0
13

true
*:*
xml




doc1
Document 1

B

1431502700990693376




Step #3 - Update the document specifying only 'id'
curl http://localhost:1/solr/simple-collection/update?commit=true
-H "Content-Type: text/xml" -d '



doc1

'

http://localhost:1/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true


0
14

true
*:*
xml




doc1
1431502818264481792




---

Now I realize that "updating" a document and specifying only the 'id'
is pointless but the unusual behavior, in my view, is that in this
circumstance Solr seems to be deleting the 'name_s' field. In fact,
all fields except 'id' are lost. The unusual behaviour, in my view, is
that Solr will perform an update when at least one field (other than
'id') is specified but when only 'id' is specified it seems to be
deleting and re-adding the document without preserving the existing
data.

Can someone please comment on this behaviour and indicate whether or
not it is in fact correct or if it represents a defect?

Thanks,
--
Curt


RE: Score after boost & before

2013-04-05 Thread Swati Swoboda
http://explain.solr.pl/ might help you out with parsing out the response to see 
how boosts are affecting the scores. Take a look at some of the 
history/examples:

http://explain.solr.pl/explains/7kjl0ids


-Original Message-
From: abhayd [mailto:ajdabhol...@hotmail.com] 
Sent: Friday, April 05, 2013 12:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Score after boost & before

we do that now, but thats very time consuming.

Also we want our QA to have that info available on search result page in debug 
mode.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Score-after-boost-before-tp4054052p4054102.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.2.0 seems messing up with ping operaton...

2013-04-05 Thread hyrax
Hi all,
I just moved from Solr 4.0.0 to Solr 4.2.0. BTW, I'm using SolrCloud (port:
8983 and 7574).
So when I was trying to ping the cloud by
http://localhost:8983/solr/admin/ping?wt=json, I got:
{"responseHeader":{"status":0,"QTime":2,"params":{"df":"text","echoParams":"all","rows":"10","echoParams":"all","wt":"json","q":"solrpingquery","distrib":"false"}},"status":"OK"}
{"responseHeader":{"status":0,"QTime":1,"params":{"df":"text","echoParams":"all","rows":"10","echoParams":"all","wt":"json","q":"solrpingquery","distrib":"false"}},"status":"OK"}
which seems kindly providing info for all solr instances but this is an
invalid json!!!
When using 4.0.0, it only returned only one response but now there are two.
And it will also return an error page if you want a xml format response by
http://localhost:8983/solr/admin/ping?wt=xml which used to work fine under
4.0.0.
Could anyone help me with this ping issue?
Many many thanks in advance!
Regards,
Hao



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-0-seems-messing-up-with-ping-operaton-tp4054117.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: detailed Error reporting in Solr

2013-04-05 Thread Walter Underwood
It is not a bug. XML parsers are required to reject documents with undefined 
character entities.

Try parsing it as HTML or XHTML.

wunder

On Apr 4, 2013, at 11:14 AM, eShard wrote:

> Yes, that's it exactly.
> I crawled a link with these ( ›) in each list item and solr
> couldn't handle it threw the xml parse error and the crawler terminated the
> job.
> 
> Is this fixable? Or do I have to submit a bug to the tika folks?
> 
> Thanks,
> 






Re: unknown field error when indexing with nutch

2013-04-05 Thread Jack Krupansky

Check your solrconfig.xml file for references to a "host" field.

But maybe more importantly, make sure you use a Solr 4.1 solrconfig and 
merge in any of your application-specific changes.


-- Jack Krupansky

-Original Message- 
From: Amit Sela

Sent: Friday, April 05, 2013 12:57 PM
To: solr-user@lucene.apache.org
Subject: unknown field error when indexing with nutch

Hi all,

I'm trying to run a nutch crawler and index to Solr.
I'm running Nutch 1.6 and Solr 4.2.

I managed to crawl and index with that Nutch version into Solr 3.6.2 but I
can't seem to manage to run it with Solr 4.2

I re-built Nutch with the schema-solr4.xml and copied that file to
SOLR_HOME/example/solr/collection1/conf/schema.xml but the job fails when
trying to index:

SolrException: ERROR: [doc=
http://0movies.com/watchversion.php?id=3818&link=1364879137] unknown field
'host'

It looks like Solr is not aware of the schema... Did I miss something ?

Thanks. 



unknown field error when indexing with nutch

2013-04-05 Thread Amit Sela
Hi all,

I'm trying to run a nutch crawler and index to Solr.
I'm running Nutch 1.6 and Solr 4.2.

I managed to crawl and index with that Nutch version into Solr 3.6.2 but I
can't seem to manage to run it with Solr 4.2

I re-built Nutch with the schema-solr4.xml and copied that file to
SOLR_HOME/example/solr/collection1/conf/schema.xml but the job fails when
trying to index:

SolrException: ERROR: [doc=
http://0movies.com/watchversion.php?id=3818&link=1364879137] unknown field
'host'

It looks like Solr is not aware of the schema... Did I miss something ?

Thanks.


Re: Score after boost & before

2013-04-05 Thread abhayd
we do that now, but thats very time consuming.

Also we want our QA to have that info available on search result page in
debug mode.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Score-after-boost-before-tp4054052p4054102.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Score after boost & before

2013-04-05 Thread Gora Mohanty
On 5 April 2013 18:11, abhayd  wrote:
> hi
>
> I am using edismax and boosting certain fields using bq during query time.
>
> I would like to compare effect of boost side by side with original score
> without boost. Is there anyway i can get original score without boosting?

Do not think that is possible.

Could you not compare queries with/without boosts side-by-side in two
browser windows?

Regards,
Gora


Re: merging query results with ata from other source

2013-04-05 Thread Erick Erickson
You're probably looking at two custom components. To get started, look
in solrconfig.xml
for first-component and last-component. BTW, writing custom components isn't
all that hard once you get to the place to start...

Anyway, your first component reaches out to your other data source and
"does the right thing" to the incoming query based on what it finds
in the external source.

You either write a last-component or use a DocTransformer (new in 4.x)
to change the document on the way out. DocTransformer is probably best,
although I haven't personally written one yet but they seem to be The New Way.
There's nothing equivalent for the query bits that I know of though

Best
Erick

On Fri, Apr 5, 2013 at 8:28 AM, Maciej Liżewski  wrote:
> Ok., my case is like this: I have Solr index with some documents that must
> be left intact. I also need to store somewhere else some data related to
> documents in Solr (it can be SQL database or another Solr core).
>
> In other words - I need to have some data stored independently to main Solr
> index (for example tagging, user-rating, etc), but I need then to use such
> data in queries to the Solr index.
>
> Now - what I need to extend/replace to be able to:
>
> 1)  filter Solr queries with such remote data (I can fetch IDs of
> documents that should be listed and I need to intersect it somehow with
> query results)?
>
> 2)  Somehow extend returned results (documents itself or as additional
> section in response similar to highlighter) and provide related data (from
> external source) with selected documents.
>
>
>
> Any help appreciated.
>
>
>
> --
>
> Maciej Liżewski
>


Need Help for schema definition

2013-04-05 Thread contact_pub...@mail-impact.com

Hi all,

well i'm totally newbies on solr, and I need some help.

Ok raw definition of my needs :

I have a product database, with ordinary fields to describe a product. 
Name, reference, description, large description, product specifications, 
categories etc...


The needs :

1 - Being able to search thought product name, description, 
specification, reference
2 - Being able to find quickly all product from a category. For now it 
gave me more result.
3 - being able to find in a result set all the facets corresponding to 
the product specification (ex : number of products in wood, number of 
product having a diameter of 20cm or in a range). I look for a automatic 
process tell me the 5  most present specification in the result set and 
the number of product for each.


4 - last but not least : I have a particular type of product (spare 
parts) for them I need to be able to :

- find the by brand
- find the by name
- find the by reference
- compatibe model : as compatible model in the description 
field and need to be treated with regular expression to make a list of 
the different compatible model (org. text ex : spare part for *Pompe HG 
large model, model HGS v5)

*

I use*d* databaseimporthandler, to retreive the data, and it seem to be 
good for the first range of product, however I need to ajust the 
tokeniser and filter because it's to strict for now.


For the second set of data, I create a second entity in the 
data-import-config.xml adding the data in the same fields, but it 
doesn't feet my needs as the results are mixed and I can't select a 
specific entity to search into.


Thanks in advance for your help

David





Re: Streaming search results

2013-04-05 Thread Erick Erickson
Haven't used it myself, but:
https://issues.apache.org/jira/browse/SOLR-2112 seems to fit the bill
at least for SolrJ.

On Wed, Apr 3, 2013 at 5:20 PM, Victor Miroshnikov  
wrote:
> Is it possible to stream search results from Solr? Seems that this feature is 
> missing.
>
> I see two options to solve this:
>
> 1. Using search results pagination feature
> The idea is to implement a smart proxy that will stream chunks from search 
> results using pagination.
>
> 2. Implement Solr plugin with search streaming feature (is that possible at 
> all?)
>
> First option is easy to implement and reliable, though I dont know what are 
> the drawbacks.
>
> Regards,
> Viktor
>
>


Re: Filtering Search Cloud

2013-04-05 Thread Erick Erickson
I cannot emphasize strongly enough that you need to _prove_ you have
a problem before you decide on a solution! Do you have any evidence
that solrcloud can't handle the load you intend? Might a better approach
be just to create more shards thus spreading the load and get all the
HA/DR goodness of SolrCloud?

So far you've said you'll have a "heavy" load without giving us any numbers.
10,000 update/second? 10 updates/second? 1 query/second? 100,000
queries/second? 100,000 documents? 1,000,000,000,000 documents?

Best
Erick

On Wed, Apr 3, 2013 at 5:15 PM, Shawn Heisey  wrote:
> On 4/3/2013 1:52 PM, Furkan KAMACI wrote:
>> Thanks for your explanation, you explained every thing what I need. Just
>> one more question. I see that I can not make it with Solr Cloud, but I can
>> do something like that with master-slave replication of Solr. If I use
>> master-slave replication of Solr, can I eliminate (filter) something
>> (something that is indexed from master) from being a response after
>> querying (querying from slaves) ?
>
> I don't understand the question.  I will attempt to give you more
> information, but it might not answer your question.  If not, you'll have
> to try to improve your question.
>
> Your master and each of that master's slaves will have the same index as
> soon as replication is done.  A query on the slave has no idea that the
> master exists.
>
> Thanks,
> Shawn
>


Re: Flow Chart of Solr

2013-04-05 Thread Erick Erickson
Then there's my lazy method. Fire up the IDE and find a test case that
looks close to something you want to understand further. Step through
it all in the debugger. I admit there'll be some fumbling at the start
to _find_ the test case, but they're pretty well named. In IntelliJ,
all you have to do is right-click on the test case and the context
menu says "debug blahbalbhabl" You can chart the class
relationships you actually wind up in as you go. This seems tedious,
but it saves me getting lost in the class hierarchy.

Also, there are some convenient tools in the IDE that will show you
class hierarchies as you need.

Or attach your debugger to a running Solr, which is actually very
easy. In IntelliJ (and Eclipse has something very similar), create a
"remote" project. That'll specify some parameters you start up with,
e.g.:
java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5900
-jar start.jar

Now start up the remote debugging session you just created in the IDE
and you are attached to a live solr instance and able to step through
any code you want.

Either way, you can make the IDE work for you!

FWIW,
Erick

On Wed, Apr 3, 2013 at 12:03 PM, Jack Krupansky  wrote:
> We're using the 4.x branch code as the basis for our writing. So,
> effectively it will be for at least 4.3 when the book comes out in the
> summer.
>
> Early access will be in about a month or so. O'Reilly will be showing a
> galley proof for 200 pages of the book next week at Big Data TechCon next
> week in Boston.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Jack Park
> Sent: Wednesday, April 03, 2013 12:56 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Flow Chart of Solr
>
> Jack,
>
> Is that new book up to the 4.+ series?
>
> Thanks
> The other Jack
>
> On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky 
> wrote:
>>
>> And another one on the way:
>>
>> http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957
>>
>> Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Jack Park
>> Sent: Wednesday, April 03, 2013 11:25 AM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: Flow Chart of Solr
>>
>> There are three books on Solr, two with that in the title, and one,
>> Taming Text, each of which have been very valuable in understanding
>> Solr.
>>
>> Jack
>>
>> On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky 
>> wrote:
>>>
>>>
>>> Sure, yes. But... it comes down to what level of detail you want and need
>>> for a specific task. In other words, there are probably a dozen or more
>>> levels of detail. The reality is that if you are going to work at the
>>> Solr
>>> code level, that is very, very different than being a "user" of Solr, and
>>> at
>>> that point your first step is to become familiar with the code itself.
>>>
>>> When you talk about "parsing" and "stemming", you are really talking
>>> about
>>> the user-level, not the Solr code level. Maybe what you really need is a
>>> cheat sheet that maps a user-visible feature to the main Solr code
>>> component
>>> for that implements that user feature.
>>>
>>> There are a number of different forms of "parsing" in Solr - parsing of
>>> what? Queries? Requests? Solr documents? Function queries?
>>>
>>> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
>>> that.
>>> Lucene does all of the "token filtering". Are you asking for details on
>>> how
>>> Lucene works? Maybe you meant to ask how "term analysis" works, which is
>>> split between Solr and Lucene. Or maybe you simply wanted to know when
>>> and
>>> where term analysis is done. Tell us your specific problem or specific
>>> question and we can probably quickly give you an answer.
>>>
>>> In truth, NOBODY uses "flow charts" anymore. Sure, there are some
>>> user-level
>>> diagrams, but not down to the code level.
>>>
>>> If you could focus on specific questions, we could give you specific
>>> answers.
>>>
>>> "Main steps"? That depends on what level you are working at. Tell us what
>>> problem you are trying to solve and we can point you to the relevant
>>> areas.
>>>
>>> In truth, if you become generally familiar with Solr at the user level
>>> (study the wikis), you will already know what the "main steps" are.
>>>
>>> So, it is not "main steps of Solr", but main steps of some specific
>>> "request" of Solr, and for a specified level of detail, and for a
>>> specified
>>> area of Solr if greater detail is needed. Be more specific, and then we
>>> can
>>> be more specific.
>>>
>>> For now, the general advice for people who need or want to go far beyond
>>> the
>>> user level is to "get familiar with the code" - just LOOK at it - a lot
>>> of
>>> the package and class names are OBVIOUS, really, and follow the class
>>> hierarchy and code flow using the standard features of any modern Java
>>> IDE.
>>> If you are wondering where to start for some specific user-level feature,

merging query results with ata from other source

2013-04-05 Thread Maciej Liżewski
Ok., my case is like this: I have Solr index with some documents that must
be left intact. I also need to store somewhere else some data related to
documents in Solr (it can be SQL database or another Solr core).

In other words - I need to have some data stored independently to main Solr
index (for example tagging, user-rating, etc), but I need then to use such
data in queries to the Solr index.

Now - what I need to extend/replace to be able to:

1)  filter Solr queries with such remote data (I can fetch IDs of
documents that should be listed and I need to intersect it somehow with
query results)?

2)  Somehow extend returned results (documents itself or as additional
section in response similar to highlighter) and provide related data (from
external source) with selected documents.

 

Any help appreciated.

 

--

Maciej Liżewski



Re: RequestHandler.. Conditional components

2013-04-05 Thread Erick Erickson
I think we need a better explanation of the issue. It's not
clear whether you're talking about combining Solr & external
results or just doing one or the other.

Best
Erick

On Tue, Apr 2, 2013 at 8:30 PM, venkata  wrote:
> In our use cases,  for certain query terms, we want to redirect the query
> processing to external system
> & for the rest of the keywords, we want to continue with query component ,
> facets etc.
>
> Based on some condition it is possible to skip some components in a request
> handler?
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/RequestHandler-Conditional-components-tp4053381.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Collection's Size

2013-04-05 Thread Alexandre Rafalovitch
I'd add rows=0, just to avoid the actual records serialization if size is
all that matters.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Apr 5, 2013 at 8:26 AM, Jack Krupansky wrote:

> Query for "*:*" and look at the number of documents found.
>
> -- Jack Krupansky
>
> -Original Message- From: Ranjith Venkatesan
> Sent: Friday, April 05, 2013 2:06 AM
> To: solr-user@lucene.apache.org
> Subject: Solr Collection's Size
>
>
> Hi,
>
> I am new to solr. I want to find size of collection dynamically via solrj.
> I
> tried many ways but i couldnt succeed in any of those. Pls help me with
> this
> issue.
>


Re: A request handler that manipulated the index

2013-04-05 Thread Erick Erickson
I _think_, and I'm really stretching here, that you'd be OK if you
shared a single
index writer amongst all the requests. Having more than one index writer
going against the same index is a definite no-no.

But atomic updates are a request (admittedly an update request) so it must be
possible, that code might provide some hints?

Erick@NotSureButItSeemsReasonable

On Tue, Apr 2, 2013 at 1:06 PM, Benson Margulies  wrote:
> I am thinking about trying to structure a problem as a Solr plugin. The
> nature of the plugin is that it would need to read and write the lucene
> index to do its work. It could not be cleanly split into URP 'over here'
> and a Search Component 'over there'.
>
> Are there invariants of Solr that would preclude this, like assumptions in
> the implementation of the cache?


Re: Need Help in Patching OPENNLP

2013-04-05 Thread Erick Erickson
Gora: Thanks for pitching in, I'm on vacation and only sporadically
looking at the lists.

Karthicrnair:
https is the access to the writeable archive, it's been a long enough
since I set things up that I don't remember if you need committer
credentials or not, so try straight http (without the 's') maybe?

Erick

On Tue, Apr 2, 2013 at 1:01 AM, karthicrnair  wrote:
> Thanks much !!
>
> Explorer -- Internet Explorer :) Sorry for the miscommunication. Yeah let me
> check it once again.
>
> appreciate all the help :)
>
> krn
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Need-Help-in-Patching-OPENNLP-tp4052362p4053094.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Score after boost & before

2013-04-05 Thread abhayd
hi 

I am using edismax and boosting certain fields using bq during query time.

I would like to compare effect of boost side by side with original score
without boost. Is there anyway i can get original score without boosting?

thanks
abhay



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Score-after-boost-before-tp4054052.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Collection's Size

2013-04-05 Thread Jack Krupansky

Query for "*:*" and look at the number of documents found.

-- Jack Krupansky

-Original Message- 
From: Ranjith Venkatesan

Sent: Friday, April 05, 2013 2:06 AM
To: solr-user@lucene.apache.org
Subject: Solr Collection's Size

Hi,

I am new to solr. I want to find size of collection dynamically via solrj. I
tried many ways but i couldnt succeed in any of those. Pls help me with this
issue.


Thanks in advance

Ranjith Venkatesan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Collection-s-Size-tp4054011.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Solr Collection's Size

2013-04-05 Thread Ranjith Venkatesan
Hi,

I am new to solr. I want to find size of collection dynamically via solrj. I
tried many ways but i couldnt succeed in any of those. Pls help me with this
issue.


Thanks in advance

Ranjith Venkatesan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Collection-s-Size-tp4054011.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr spell suggestions help

2013-04-05 Thread Rohan Thakur
hi all

I had some issues with solr spell suggestions.

1) first of all I wanted to know is indexbased spell suggestions better
then directspell suggestions that solr 4.1 provides in any way?

 2) then I wanted to know is their way I can get suggestions for words
providing only few prefix for the word. like when I query sam I should get
samsung as one of suggestion.

3) also I wanted to know why am I not getting suggestions for the words
that have more then 2 character difference between them like if I query for
wirlpool wich has 8 characters I get suggestion as whirlpool which is 9
characters and correct spelling but when I query for wirlpol which is 7
characters it says that this is false spelling but does not show any
suggestions. even like if I search for pansonic(8 char) it provides
panasonic(9 char) as suggestion but when I remove one more character that
is is search for panonic(7 char) it does not return any suggestions?? how
can I correct this? even when I search for ipo it does not return ipod as
suggestions?

4) one more thing I want to get clear that when I search for microwave ovan
it does not give any miss spell even when ovan is wrong it provides the
result for microwave saying the query is correct...this is the case when
one of the term in the query is correct while others are incorrect it does
not point out the wrong spelling one but reutrns the result for correct
word thats it how can I correct this? similar is the case when I query for
microvave oven is shows the result for oven saying that the query is
correct..

5) one more case is when I query plntronies (correct word is: plantronics)
it does not return any solution but when I query for plantronies it returns
the plantronics as suggestions why is that happening?

*my schema.xml is:*

  
  
  
  
  
  
  
   
   
  
  
  
  
  
  
   
 






*my solrconfig.xml is :*










  *default*

  solr.DirectSolrSpellChecker
  
  
  *spell
  internal
  
  0.3
  
  1
  
  1
  
  5
  
  4
  
  0.01
  
*


*
  wordbreak
  solr.WordBreakSolrSpellChecker
  spell
  true
  true
  3
   
*



 *  
 jarowinkler
 spell
 solr.DirectSolrSpellChecker
 org.apache.lucene.search.spell.JaroWinklerDistance
   *








 
 
   *tSpell
  *


 

* *

  
  *

  spell
  
  default
  wordbreak
 
 
  
  false
  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  
*



thanks in advance
regards
Rohan


Re: how to avoid single character to get indexed for directspellchecker dictionary

2013-04-05 Thread Rohan Thakur
hi james

I have tried using length filter factory as well but it seems that it is
removing the single character from the index but when I qeuery for delll it
is still giving dell l in suggestions this I think is due to querying the
term like dell l  solr can find the result as in it will tokenise dell and
l and will return the results with dell in the documents so to remove such
thing do I have to use minbreaklenth? and what is the significance of
minbreak length number?


On Fri, Apr 5, 2013 at 12:20 PM, Rohan Thakur  wrote:

> hi james
>
> after using this its working file for delll but not for de. what does
> this minbreaklength signifies?
>
>
> also can you tell me why am I not getting suggestions for smaller words
> like for del i should get dell as suggestion but its not giving any
> suggestions and also can I get suggestion for like complete the sentence
> like if I give sams it should also give samsung as in suggestion?
>
> thanks
> regards
> Rohan
>
>
>
>
> On Fri, Apr 5, 2013 at 12:54 AM, Dyer, James  > wrote:
>
>> I assume if your user queries "delll" and it breaks it into pieces like
>> "de l l l", then you're probably using WordBreakSolrSpellChecker in
>> addition to DirectSolrSpellChecker, right?  If so, then you can specify
>> "minBreakLength" in solrconfig.xml like this:
>>
>> 
>> ... spellcheckers here ...
>> 
>>   wordbreak
>>   solr.WordBreakSolrSpellChecker
>>   ... parameters here ...
>> 5
>> 
>> 
>>
>> One note is that both DirectSolrSpellChecker and
>> WordBreakSolrSpellChecker operate directly on the terms dictionary and do
>> not have a separate dictionary like IndexBasedSpellChecker.  The only way
>> to prevent a word from being in the dictionary then is to filter this out
>> in the analysis chain.  For instance, if you use  to build a
>> field just for spellchecking, you can use LengthFilterFactory to remove the
>> short terms.  See
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory.
>>
>> James Dyer
>> Ingram Content Group
>> (615) 213-4311
>>
>>
>> -Original Message-
>> From: Rohan Thakur [mailto:rohan.i...@gmail.com]
>> Sent: Thursday, April 04, 2013 1:42 PM
>> To: solr-user@lucene.apache.org
>> Subject: how to avoid single character to get indexed for
>> directspellchecker dictionary
>>
>> hi all
>>
>> I am using solr directspellcheker for spell suggestions using raw analyses
>> for indexing but I have some fields which have single characters like l L
>> so its is been indexed in the dictionary and when I am using this for
>> suggestions for query like delll its suggesting de and l l l as the spell
>> correction as my index has de and l as single characters in the fields.
>> please help.
>>
>> thanks
>> regards
>> Rohan
>>
>>
>


Re: performance on concurrent search request

2013-04-05 Thread Anatoli Matuskova
Does anyone know how is this implemented?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-on-concurrent-search-request-tp4053182p4054030.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: do SearchComponents have access to response contents

2013-04-05 Thread xavier jmlucjav
I knew I could do that at jetty level with a servlet for instance, but the
user wants to do this stuff inside solr code itself. Now that you mention
the logs...that could be a solution without modifying the webapp...

thanks for the input!
xavier


On Fri, Apr 5, 2013 at 7:55 AM, Amit Nithian  wrote:

> "We need to also track the size of the response (as the size in bytes of
> the
> whole xml response tat is streamed, with stored fields and all). I was a
> bit worried cause I am wondering if a searchcomponent will actually have
> access to the response bytes..."
>
> ==> Can't you get this from your container access logs after the fact? I
> may be misunderstanding something but why wouldn't mining the Jetty/Tomcat
> logs for the response size here suffice?
>
> Thanks!
> Amit
>
>
> On Thu, Apr 4, 2013 at 1:34 AM, xavier jmlucjav 
> wrote:
>
> > A custom QueryResponseWriter...this makes sense, thanks Jack
> >
> >
> > On Wed, Apr 3, 2013 at 11:21 PM, Jack Krupansky  > >wrote:
> >
> > > The search components can see the "response" as a namedlist, but it is
> > > only when SolrDispatchFIlter calls the QueryResponseWriter that XML or
> > JSON
> > > or whatever other format (Javabin as well) is generated from the named
> > list
> > > for final output in an HTTP response.
> > >
> > > You probably want a custom query response writer that wraps the XML
> > > response writer. Then you can generate the XML and then do whatever you
> > > want with it.
> > >
> > > The QueryResponseWriter class and  in
> > solrconfig.xml.
> > >
> > > -- Jack Krupansky
> > >
> > > -Original Message- From: xavier jmlucjav
> > > Sent: Wednesday, April 03, 2013 4:22 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: do SearchComponents have access to response contents
> > >
> > >
> > > I need to implement some SearchComponent that will deal with metrics on
> > the
> > > response. Some things I see will be easy to get, like number of hits
> for
> > > instance, but I am more worried with this:
> > >
> > > We need to also track the size of the response (as the size in bytes of
> > the
> > > whole xml response tat is streamed, with stored fields and all). I was
> a
> > > bit worried cause I am wondering if a searchcomponent will actually
> have
> > > access to the response bytes...
> > >
> > > Can someone confirm one way or the other? We are targeting Sorl4.0
> > >
> > > thanks
> > > xavier
> > >
> >
>