Re: Search for misspelled words in corpus

2013-06-08 Thread Otis Gospodnetic
Hm, I was purposely avoiding mentioning ngrams because just ngramming
all indexed tokens would balloon the index My assumption was that
only *some* words are misspelled, in which case it may be better not
to ngram all tokens

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Sun, Jun 9, 2013 at 2:30 AM, Jagdish Nomula  wrote:
> Another theoretical answer for this question is ngrams approach. You can
> index the word and its trigrams. Query the index, by the string as well as
> its trigrams, with a % match search. You than pass the exhaustive resultset
> through a more expensive scoring such as Smith Waterman.
>
> Thanks,
>
> Jagdish
>
>
> On Sat, Jun 8, 2013 at 11:03 PM, Shashi Kant  wrote:
>
>> n-grams might help, followed by a edit distance metric such as Jaro-Winkler
>> or Smith-Waterman-Gotoh to further filter out.
>>
>>
>> On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic <
>> otis.gospodne...@gmail.com
>> > wrote:
>>
>> > Interesting problem.  The first thing that comes to mind is to do
>> > "word expansion" during indexing.  Kind of like synonym expansion, but
>> > maybe a bit more dynamic. If you can have a dictionary of correctly
>> > spelled words, then for each token emitted by the tokenizer you could
>> > look up the dictionary and expand the token to all other words that
>> > are similar/close enough.  This would not be super fast, and you'd
>> > likely have to add some custom heuristic for figuring out what
>> > "similar/close enough" means, but it might work.
>> >
>> > I'd love to hear other ideas...
>> >
>> > Otis
>> > --
>> > Solr & ElasticSearch Support
>> > http://sematext.com/
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Jun 5, 2013 at 9:10 AM, కామేశ్వర రావు భైరవభట్ల
>> >  wrote:
>> > > Hi,
>> > >
>> > > I have a problem where our text corpus on which we need to do search
>> > > contains many misspelled words. Same word could also be misspelled in
>> > > several different ways. It could also have documents that have correct
>> > > spellings However, the search term that we give in query would always
>> be
>> > > correct spelling. Now when we search on a term, we would like to get
>> all
>> > > the documents that contain both correct and misspelled forms of the
>> > search
>> > > term.
>> > > We tried fuzzy search, but it doesn't work as per our expectations. It
>> > > returns any close match, not specifically misspelled words. For
>> example,
>> > if
>> > > I'm searching for a word like "fight", I would like to return the
>> > documents
>> > > that have words like "figth" and "feight", not documents with words
>> like
>> > > "sight" and "light".
>> > > Is there any suggested approach for doing this?
>> > >
>> > > regards,
>> > > Kamesh
>> >
>>
>
>
>
> --
> ***Jagdish Nomula*
> Sr. Manager Search
> Simply Hired, Inc.
> 370 San Aleso Ave., Ste 200
> Sunnyvale, CA 94085
>
> office - 408.400.4700
> cell - 408.431.2916
> email - jagd...@simplyhired.com 
>
> www.simplyhired.com


Velocity / Solritas not works in solr 4.3 and Tomcat 6

2013-06-08 Thread andy tang
*Could anyone help me to see what is the reason which Solritas page failed?*

*I can go to http://localhost:8080/solr without problem, but fail to go to
http://localhost:8080/solr/browse*

*As below is the status report! Any help is appreciated.*

*Thanks!*

*Andy*

*
*

*type* Status report

*message* *{msg=lazy loading
error,trace=org.apache.solr.common.SolrException: lazy loading error at
org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.getWrappedWriter(SolrCore.java:2260)
at
org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.getContentType(SolrCore.java:2279)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:623)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:372)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:879)
at
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:617)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1760)
at java.lang.Thread.run(Unknown Source) Caused by:
org.apache.solr.common.SolrException: Error Instantiating Query Response
Writer, solr.VelocityResponseWriter failed to instantiate
org.apache.solr.response.QueryResponseWriter at
org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539) at
org.apache.solr.core.SolrCore.createQueryResponseWriter(SolrCore.java:604)
at org.apache.solr.core.SolrCore.access$200(SolrCore.java:131) at
org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.getWrappedWriter(SolrCore.java:2255)
... 16 more Caused by: java.lang.ClassCastException: class
org.apache.solr.response.VelocityResponseWriter at
java.lang.Class.asSubclass(Unknown Source) at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:458)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518) ... 19
more ,code=500}*

*description* *The server encountered an internal error that prevented it
from fulfilling this request.*


Re: Search for misspelled words in corpus

2013-06-08 Thread Jagdish Nomula
Another theoretical answer for this question is ngrams approach. You can
index the word and its trigrams. Query the index, by the string as well as
its trigrams, with a % match search. You than pass the exhaustive resultset
through a more expensive scoring such as Smith Waterman.

Thanks,

Jagdish


On Sat, Jun 8, 2013 at 11:03 PM, Shashi Kant  wrote:

> n-grams might help, followed by a edit distance metric such as Jaro-Winkler
> or Smith-Waterman-Gotoh to further filter out.
>
>
> On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic <
> otis.gospodne...@gmail.com
> > wrote:
>
> > Interesting problem.  The first thing that comes to mind is to do
> > "word expansion" during indexing.  Kind of like synonym expansion, but
> > maybe a bit more dynamic. If you can have a dictionary of correctly
> > spelled words, then for each token emitted by the tokenizer you could
> > look up the dictionary and expand the token to all other words that
> > are similar/close enough.  This would not be super fast, and you'd
> > likely have to add some custom heuristic for figuring out what
> > "similar/close enough" means, but it might work.
> >
> > I'd love to hear other ideas...
> >
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> >
> >
> >
> >
> > On Wed, Jun 5, 2013 at 9:10 AM, కామేశ్వర రావు భైరవభట్ల
> >  wrote:
> > > Hi,
> > >
> > > I have a problem where our text corpus on which we need to do search
> > > contains many misspelled words. Same word could also be misspelled in
> > > several different ways. It could also have documents that have correct
> > > spellings However, the search term that we give in query would always
> be
> > > correct spelling. Now when we search on a term, we would like to get
> all
> > > the documents that contain both correct and misspelled forms of the
> > search
> > > term.
> > > We tried fuzzy search, but it doesn't work as per our expectations. It
> > > returns any close match, not specifically misspelled words. For
> example,
> > if
> > > I'm searching for a word like "fight", I would like to return the
> > documents
> > > that have words like "figth" and "feight", not documents with words
> like
> > > "sight" and "light".
> > > Is there any suggested approach for doing this?
> > >
> > > regards,
> > > Kamesh
> >
>



-- 
***Jagdish Nomula*
Sr. Manager Search
Simply Hired, Inc.
370 San Aleso Ave., Ste 200
Sunnyvale, CA 94085

office - 408.400.4700
cell - 408.431.2916
email - jagd...@simplyhired.com 

www.simplyhired.com


Re: load balancing internal Solr on Azure

2013-06-08 Thread Otis Gospodnetic
Hi Kevin,

Would http://search-lucene.com/?q=LBHttpSolrServer work for you?

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Fri, May 24, 2013 at 3:12 PM, Kevin Osborn  wrote:
> We are looking install SolrCloud on Azure. We want it to be an internal
> service. For some applications that use SolrJ, we can use ZooKeeper. But
> for other applications that don't talk to Azure, we will need to go through
> a load balancer to distribute traffic among the Solr instances (VMs, IaaS).
>
> The problem is that Azure as far as I am aware does not have a load
> balancer for internal services. Internal endpoints are not load balanced.
>
> This is obviously not a problem specific to Solr, but I was hoping that
> other people might have some good ideas for addressing this issue. Thanks.
>
> --
> *KEVIN OSBORN*
> LEAD SOFTWARE ENGINEER
> CNET Content Solutions
> OFFICE 949.399.8714
> CELL 949.310.4677  SKYPE osbornk
> 5 Park Plaza, Suite 600, Irvine, CA 92614
> [image: CNET Content Solutions]


Re: HyperLogLog for Solr

2013-06-08 Thread Otis Gospodnetic
I have not heard of anyone using HLL in Solr, but:

https://docs.google.com/presentation/d/1ESNiqd7HuIfuwXSSK81PAAu6AmEPEE0u_vyk4FU5x9o/present#slide=id.p
https://github.com/ptdavteam/elasticsearch-approx-plugin

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Tue, May 28, 2013 at 2:43 AM, J Mohamed Zahoor  wrote:
> Hi
>
> Has anyone tried using HLL for using finding unique values of a field in solr?
> I am planning to use them to facet count on certain fields to reduce memory 
> footprint.
>
>
>
> ./Zahoor


Re: Note on The Book

2013-06-08 Thread Otis Gospodnetic
It's 2013 and people suffer from ADD.  Break it up into a la carte
chapter books.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, May 29, 2013 at 6:23 PM, Jack Krupansky  wrote:
> Markus,
>
> Okay, more pages it is!
>
> -- Jack Krupansky
>
> -Original Message- From: Markus Jelsma
> Sent: Wednesday, May 29, 2013 5:35 PM
>
> To: solr-user@lucene.apache.org
> Subject: RE: Note on The Book
>
> Jack,
>
> I'd prefer tons of information instead of a meager 300 page book that leaves
> a lot of questions. I'm looking forward to a paperback or hardcover book and
> price doesn't really matter, it is going to be worth it anyway.
>
> Thanks,
> Markus
>
>
>
> -Original message-
>>
>> From:Jack Krupansky 
>> Sent: Wed 29-May-2013 15:10
>> To: solr-user@lucene.apache.org
>> Subject: Re: Note on The Book
>>
>> Erick, your point is well taken. Although my primary interest/skill is to
>> produce a solid foundation reference (including tons of examples), the
>> real
>> goal is to then build on top of that foundation.
>>
>> While I focus on the hard-core material - which really does include some
>> narrative and lots of examples in addition to tons of "mere" reference, my
>> co-author, Ryan Tabora, will focus almost exclusively on... narrative and
>> diagrams.
>>
>> And when I say reference, I also mean lots of examples. Even as the
>> hard-core reference stabilizes, the examples will continue to grow ("like
>> weeds!").
>>
>> Once we get the current, existing, under-review, chapters packaged into
>> the
>> new book and available for purchase and download (maybe Lulu, not decided)
>> -
>> available, in a couple of weeks, it will be updated approximately every
>> other week, both with additional reference material, and additional
>> narrative and diagrams.
>>
>> One of our priorities (after we get through Stage 0 of the next few weeks)
>> is to in fact start giving each of the long Deep Dive Chapters enough
>> narrative lead to basically say exactly that - why you should care.
>>
>> A longer-term priority is to improve the balance of narrative and
>> hard-core
>> reference. Yeah, that will be a lot of pages. It already is. We were at
>> 907
>> pages and I was about to drop in another 166 pages on update handlers when
>> O'Reilly threw up their hands and pulled the plug. I was estimating 1200
>> pages at that stage. And I'll probably have another 60-80 pages on update
>> request processors within a week or so. With more to come. That did
>> include
>> a lot of hard-core material and example code for Lucene, which won't be in
>> the new Solr-only book. By focusing on an e-book the raw page count alone
>> becomes moot. We haven't given up on print - the intent is eventually to
>> have multiple volumes (4-8 or so, maybe more), both as cheaper e-books ($3
>> to $5 each) and slimmer print volumes for people who don't need everything
>> in print.
>>
>> In fact, we will likely offer the revamped initial chapters of the book as
>> a
>> standalone introduction to Solr - narrative introduction ("why should you
>> care about Solr"), basic concepts of Lucene and Solr (and why you should
>> care!), brief tutorial walkthough of the major feature areas of Solr, and
>> a
>> case study. The intent would be both e-book and a slim print volume (75
>> pages?).
>>
>> Another priority (beyond Stage 0) is to develop a detailed roadmap diagram
>> of Solr and how applications can use Solr, and then use that to show how
>> each of the Deep Dive sections (heavy reference, but gradually adding more
>> narrative over time.)
>>
>> We will probably be very open to requests - what people really wish a book
>> would actually do for them. The only request we won't be open to is to do
>> it
>> all in only 300 pages.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Erick Erickson
>> Sent: Wednesday, May 29, 2013 7:19 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Note on The Book
>>
>> FWIW, picking up on Alexandre's point. One of my continual
>> frustrations with virtually _all_
>> technical books is they become endless pages of details without ever
>> mentioning why
>> the hell I should care. Unfortunately, explaining use-cases for
>> everything would only make
>> the book about 10,000 pages long. Siiigh.
>>
>> I guess you can take this as a vote for narrative
>>
>> Erick
>>
>> On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky 
>> wrote:
>> > We'll have a blog for the book. We hope to have a first
>> > raw/rough/partial/draft published as an e-book in maybe 10 days to 2
>> > weeks.
>> > As soon as we get that process under control, we'll start the blog. I'll
>> > keep your email on file and keep you posted.
>> >
>> > -- Jack Krupansky
>> >
>> > -Original Message- From: Swati Swoboda
>> > Sent: Tuesday, May 28, 2013 1:36 PM
>> > To: solr-user@lucene.apache.org
>> > Subject: RE: Note on The Book
>> >
>> >
>> > I'd definitely prefer the spiral bound as well. E-books are great and >
>> > your

Re: Search for misspelled words in corpus

2013-06-08 Thread Shashi Kant
n-grams might help, followed by a edit distance metric such as Jaro-Winkler
or Smith-Waterman-Gotoh to further filter out.


On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic  wrote:

> Interesting problem.  The first thing that comes to mind is to do
> "word expansion" during indexing.  Kind of like synonym expansion, but
> maybe a bit more dynamic. If you can have a dictionary of correctly
> spelled words, then for each token emitted by the tokenizer you could
> look up the dictionary and expand the token to all other words that
> are similar/close enough.  This would not be super fast, and you'd
> likely have to add some custom heuristic for figuring out what
> "similar/close enough" means, but it might work.
>
> I'd love to hear other ideas...
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Wed, Jun 5, 2013 at 9:10 AM, కామేశ్వర రావు భైరవభట్ల
>  wrote:
> > Hi,
> >
> > I have a problem where our text corpus on which we need to do search
> > contains many misspelled words. Same word could also be misspelled in
> > several different ways. It could also have documents that have correct
> > spellings However, the search term that we give in query would always be
> > correct spelling. Now when we search on a term, we would like to get all
> > the documents that contain both correct and misspelled forms of the
> search
> > term.
> > We tried fuzzy search, but it doesn't work as per our expectations. It
> > returns any close match, not specifically misspelled words. For example,
> if
> > I'm searching for a word like "fight", I would like to return the
> documents
> > that have words like "figth" and "feight", not documents with words like
> > "sight" and "light".
> > Is there any suggested approach for doing this?
> >
> > regards,
> > Kamesh
>


Re: Search for misspelled words in corpus

2013-06-08 Thread Otis Gospodnetic
Interesting problem.  The first thing that comes to mind is to do
"word expansion" during indexing.  Kind of like synonym expansion, but
maybe a bit more dynamic. If you can have a dictionary of correctly
spelled words, then for each token emitted by the tokenizer you could
look up the dictionary and expand the token to all other words that
are similar/close enough.  This would not be super fast, and you'd
likely have to add some custom heuristic for figuring out what
"similar/close enough" means, but it might work.

I'd love to hear other ideas...

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Jun 5, 2013 at 9:10 AM, కామేశ్వర రావు భైరవభట్ల
 wrote:
> Hi,
>
> I have a problem where our text corpus on which we need to do search
> contains many misspelled words. Same word could also be misspelled in
> several different ways. It could also have documents that have correct
> spellings However, the search term that we give in query would always be
> correct spelling. Now when we search on a term, we would like to get all
> the documents that contain both correct and misspelled forms of the search
> term.
> We tried fuzzy search, but it doesn't work as per our expectations. It
> returns any close match, not specifically misspelled words. For example, if
> I'm searching for a word like "fight", I would like to return the documents
> that have words like "figth" and "feight", not documents with words like
> "sight" and "light".
> Is there any suggested approach for doing this?
>
> regards,
> Kamesh


Dataless nodes in SolrCloud?

2013-06-08 Thread Otis Gospodnetic
Hi,

Is there a notion of a data-node vs. non-data node in SolrCloud?
Something a la http://www.elasticsearch.org/guide/reference/modules/node/


Thanks,
Otis
Solr & ElasticSearch Support
http://sematext.com/


Re: index merge question

2013-06-08 Thread Sourajit Basak
I have noticed that when I write a doc with an id that already exists, it
creates a new revision with the only the fields from the second write. I
guess there is a REST API in the latest solr version which updates only
selected fields.

In my opinion, merge should be creating a doc which is a union of the
fields assuming the fields are conforming to the schema of the output
index.

~ Sourajit


On Sun, Jun 9, 2013 at 12:06 AM, Mark Miller  wrote:

>
> On Jun 8, 2013, at 12:52 PM, Jamie Johnson  wrote:
>
> > When merging through the core admin (
> > http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy for
> > conflicts during the merge?  So for instance if I am merging core 1 and
> > core 2 into core 0 (first example), what happens if core 1 and core 2
> both
> > have a document with the same key, say core 1 has a newer version of core
> > 2?  Does the merge fail, does the newer document remain?
>
> You end up with both documents, both with that ID - not generally a
> situation you want to end up in. You need to ensure unique id's in the
> input data or replace the index rather than merging into it.
>
> >
> > Also if using the srcCore method if a document with key 1 is written
> while
> > an index also with key 1 is being merged what happens?
>
> It depends on the order I think - if the doc is written after the merge
> and it's an update, it will update the doc that was just merged in. If the
> merge comes second, you have the doc twice and it's a problem.
>
> - Mark


Re: Custom Data Clustering

2013-06-08 Thread Otis Gospodnetic
Hello,

This sounds like a custom SearchComponent.
Which clustering library you want to use or DIY is up to you, but go
with the SearchComponent approach.  You will still need to process N
hits, but you won't need to first send them all over the wire.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Fri, Jun 7, 2013 at 11:48 AM, Raheel Hasan  wrote:
> Hi,
>
> Can someone please tell me if there is a way to have a custom *`clustering
> of the data`* from `solr` 'query' results? I am facing 2 issues currently:
>
>  1. The `*Carrot*` clustering only applies clustering to the "paged"
> results (i.e. in the current pagination's page results).
>
>  2. I need to have custom clustering and classify results into certain
> classes only (i.e. only few very specific words in the search results).
> Like for example "Red", "Green", "Blue" etc... and not "hello World",
> "Known World", "green world" etc -(if you know what I mean here) -
> Where all these words in both Do and DoNot existing in the search results.
>
> Please tell me how to achieve this. Perhaps Carrot/clustering is not needed
> here and some other classifier is needed. So what to do here?
>
> Basically, I cannot receive 1 million results, then process them via
> PHP-Array to classify them as per need. The classification must be done
> here in solr only.
>
> Thanks
>
> --
> Regards,
> Raheel Hasan


Query-node+shard stickiness?

2013-06-08 Thread Otis Gospodnetic
Hi,

Is there anything in SolrCloud that would support query-node/shard
affinity/stickiness?

What I mean by that is a mechanism that is smart enough to keep
sending the same query X to the same node(s)+shard(s)... with the goal
being better utilization of Solr and OS caches?

Example:
* Imagine a Collection with 2 shards and 3 replicas: s1r1, s1r2, s1r3,
s2r1, s2r2, s2r3
* Query for "Foo Bar" comes in and hits one of the nodes, say s1r1
* Since shard 2 needs to be queried, too, one of its 3 replicas needs
to be searched.  Say s2r1 gets searched
* 5 minutes later the same query for "Foo Bar" comes in, say it hits s1r1 again
* Again shard 2 needs to be searched.  But which of the 3 replicas
should be searched?
* Ideally that same s2r1 would be searched

Is there anything in SolrCloud that can accomplish this?
Or if there a place in SolrCloud where such "query hash ==>
node/shard" mapping could be implemented?

Thanks,
Otis
--
Solr & ElasticSearch Support
http://sematext.com/


Re: Help required with fq syntax

2013-06-08 Thread Kamal Palei
Though the syntax looks fine, but I get all the records. As per example
given above I get all the documents, meaning filtering did not work. I am
curious to know if my indexing went fine or not. I will check and revert
back.


On Sun, Jun 9, 2013 at 7:21 AM, Otis Gospodnetic  wrote:

> Try:
>
> ...&q=*:*&fq=-blocked_company_ids:5
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Sat, Jun 8, 2013 at 9:37 PM, Kamal Palei  wrote:
> > Dear All
> > I have a multi-valued field blocked_company_ids in index.
> >
> > You can think like
> >
> > 1. document1 , blocked_company_ids: 1, 5, 7
> > 2. document2 , blocked_company_ids: 2, 6, 7
> > 3. document3 , blocked_company_ids: 4, 5, 6
> >
> > and so on .
> >
> > If I want to retrieve all the documents  where blocked_company_id does
> not
> > contain one particular company id say 5.
> >
> > So my search result should give me only document2 as document1 and
> > document3 both contains 5.
> >
> > To achieve this how fq syntax looks like is it something like below
> >
> > &fq=blocked_company_ids:-5
> >
> > I tried like above syntax, but it gives me 0 record.
> >
> > Can somebody help me with the syntax please, and point me where all
> syntax
> > details are given.
> >
> > Thanks
> > Kamal
> > Net Cloud Systems
>


Re: Help required with fq syntax

2013-06-08 Thread Kamal Palei
Also please note that for some documents, blocked_company_ids may not be
present as well. In such cases that document should be present in search
result as well.

BR,
Kamal


On Sun, Jun 9, 2013 at 7:07 AM, Kamal Palei  wrote:

> Dear All
> I have a multi-valued field blocked_company_ids in index.
>
> You can think like
>
> 1. document1 , blocked_company_ids: 1, 5, 7
> 2. document2 , blocked_company_ids: 2, 6, 7
> 3. document3 , blocked_company_ids: 4, 5, 6
>
> and so on .
>
> If I want to retrieve all the documents  where blocked_company_id does not
> contain one particular company id say 5.
>
> So my search result should give me only document2 as document1 and
> document3 both contains 5.
>
> To achieve this how fq syntax looks like is it something like below
>
> &fq=blocked_company_ids:-5
>
> I tried like above syntax, but it gives me 0 record.
>
> Can somebody help me with the syntax please, and point me where all syntax
> details are given.
>
> Thanks
> Kamal
> Net Cloud Systems
>
>


Re: Help required with fq syntax

2013-06-08 Thread Otis Gospodnetic
Try:

...&q=*:*&fq=-blocked_company_ids:5

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Sat, Jun 8, 2013 at 9:37 PM, Kamal Palei  wrote:
> Dear All
> I have a multi-valued field blocked_company_ids in index.
>
> You can think like
>
> 1. document1 , blocked_company_ids: 1, 5, 7
> 2. document2 , blocked_company_ids: 2, 6, 7
> 3. document3 , blocked_company_ids: 4, 5, 6
>
> and so on .
>
> If I want to retrieve all the documents  where blocked_company_id does not
> contain one particular company id say 5.
>
> So my search result should give me only document2 as document1 and
> document3 both contains 5.
>
> To achieve this how fq syntax looks like is it something like below
>
> &fq=blocked_company_ids:-5
>
> I tried like above syntax, but it gives me 0 record.
>
> Can somebody help me with the syntax please, and point me where all syntax
> details are given.
>
> Thanks
> Kamal
> Net Cloud Systems


Help required with fq syntax

2013-06-08 Thread Kamal Palei
Dear All
I have a multi-valued field blocked_company_ids in index.

You can think like

1. document1 , blocked_company_ids: 1, 5, 7
2. document2 , blocked_company_ids: 2, 6, 7
3. document3 , blocked_company_ids: 4, 5, 6

and so on .

If I want to retrieve all the documents  where blocked_company_id does not
contain one particular company id say 5.

So my search result should give me only document2 as document1 and
document3 both contains 5.

To achieve this how fq syntax looks like is it something like below

&fq=blocked_company_ids:-5

I tried like above syntax, but it gives me 0 record.

Can somebody help me with the syntax please, and point me where all syntax
details are given.

Thanks
Kamal
Net Cloud Systems


Re: does solr support query time only stopwords?

2013-06-08 Thread Otis Gospodnetic
Maybe returned hits match other query terms.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jun 8, 2013 6:34 PM, "jchen2000"  wrote:

> I wanted to analyze high frequency terms using Solr's Luke request handler
> and keep updating the stopwords file for new queries from time to time.
> Obviously I have to index all terms whether they belong to stopwords list
> or
> not.
>
> So I configured query analyzer stopwords list but disabled index analyzer
> stopwords list, However, it seems like the query would return all records
> containing stopwords after this.
>
> Anybody has an idea why this would happen?
>
> ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Entire query is stopwords

2013-06-08 Thread Jan Høydahl
Remove the stopFilter from the "index" section of your fieldType, only keep it 
in the "query" section. This way your stopwords will always be indexed and 
edismax will be able to selectively remove stopwords from the query depending 
on whether all words are stopwords or not.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

5. juni 2013 kl. 21:36 skrev Vardhan Dharnidharka :

> 
> 
> 
> Hi, 
> 
> I am using the standard edismax parser and my example query is as follows:
> 
> {!edismax qf='object_description ' rows=10 start=0 mm=-40% v='object'}
> 
> In this case, 'object' happens to be a stopword in the StopWordsFilter in my 
> datatype 'object_description'. Now, since 'object' is not indexed at all, the 
> query does not return any results. In an ideal case, I would want documents 
> containing the term 'object' to be returned. 
> 
> What is the best practice to achieve this? Index stop-words and re-query with 
> 'stopwords=false'. Or can this be done without re-querying?
> 
> Thanks, 
> Vardhan 
> 



does solr support query time only stopwords?

2013-06-08 Thread jchen2000
I wanted to analyze high frequency terms using Solr's Luke request handler
and keep updating the stopwords file for new queries from time to time.
Obviously I have to index all terms whether they belong to stopwords list or
not.

So I configured query analyzer stopwords list but disabled index analyzer
stopwords list, However, it seems like the query would return all records
containing stopwords after this.

Anybody has an idea why this would happen?

ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0



--
View this message in context: 
http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Lucene/Solr Filesystem tunings

2013-06-08 Thread Mark Miller
Turning swappiness down to 0 can have some decent performance impact.

- http://en.wikipedia.org/wiki/Swappiness

In the past, I've seen better performance with ext3 over ext4 around 
commits/fsync. Test were actually enough slower (lots of these operations), 
that I made a special ext3 partition workspace for lucene/solr dev. (Still use 
ext4 for root and home).

Have not checked that recently, and it may not be a large concern for many use 
cases.

- Mark

On Jun 4, 2013, at 6:48 PM, Tim Vaillancourt  wrote:

> Hey all,
> 
> Does anyone have any advice or special filesytem tuning to share for 
> Lucene/Solr, and which file systems they like more?
> 
> Also, does Lucene/Solr care about access times if I turn them off (I think I 
> doesn't care)?
> 
> A bit unrelated: What are people's opinions on reducing some consistency 
> things like filesystem journaling, etc (ext2?) due to SolrCloud's additional 
> HA with replicas? How about RAID 0 x 3 replicas or so?
> 
> Thanks!
> 
> Tim Vaillancourt



Re: index merge question

2013-06-08 Thread Mark Miller

On Jun 8, 2013, at 12:52 PM, Jamie Johnson  wrote:

> When merging through the core admin (
> http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy for
> conflicts during the merge?  So for instance if I am merging core 1 and
> core 2 into core 0 (first example), what happens if core 1 and core 2 both
> have a document with the same key, say core 1 has a newer version of core
> 2?  Does the merge fail, does the newer document remain?

You end up with both documents, both with that ID - not generally a situation 
you want to end up in. You need to ensure unique id's in the input data or 
replace the index rather than merging into it.

> 
> Also if using the srcCore method if a document with key 1 is written while
> an index also with key 1 is being merged what happens?

It depends on the order I think - if the doc is written after the merge and 
it's an update, it will update the doc that was just merged in. If the merge 
comes second, you have the doc twice and it's a problem.

- Mark

index merge question

2013-06-08 Thread Jamie Johnson
When merging through the core admin (
http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy for
conflicts during the merge?  So for instance if I am merging core 1 and
core 2 into core 0 (first example), what happens if core 1 and core 2 both
have a document with the same key, say core 1 has a newer version of core
2?  Does the merge fail, does the newer document remain?

Also if using the srcCore method if a document with key 1 is written while
an index also with key 1 is being merged what happens?


Re: custom field tutorial

2013-06-08 Thread Jack Krupansky
Usually, people want to do the opposite - store the numeric code as a 
numeric field for perceived efficiency and let the user query and view 
results with the text form. But, there isn't any evidence of any great 
performance benefit of doing so - just store the string code in a string 
field.


Also, your language is confusing - you say "a single integer field that maps 
to the string field" - do you actually want two separate fields? Is that the 
case? If so, just let the user query against either field depending on what 
their preference is for numeric or string codes.


And your language seems to indicate that you want the user to query by 
numeric code but the field would be indexed as a string code. Is that the 
case?


Maybe you could clarify your intentions.

Sure, with custom code, custom fields, custom codecs, custom query parsers, 
etc. you can do almost anything - but... the initial challenge for any Solr 
app developer is to first try and see if they can make due with the existing 
capabilities.


-- Jack Krupansky

-Original Message- 
From: Anria Billavara

Sent: Saturday, June 08, 2013 2:54 AM
To: solr-user@lucene.apache.org
Subject: Re: custom field tutorial


You seem to know what you want the words to map to, so index the map.  Have 
one field for the word, one field for the mapped value, and at query time, 
search the words and return the mapped field. If it is comma separated, so 
be it and split it up in your code post search.

Otherwise, same as Wunder, in my many years in search this is an odd request
Anria

Sent from my Samsung smartphone on AT&T

 Original message 
Subject: Re: custom field tutorial
From: Walter Underwood 
To: solr-user@lucene.apache.org
CC:

What are you trying to do? This seems really odd. I've been working in 
search for fifteen years and I've never heard this request.


You could always return all the fields to the client and ignore the ones you 
don't want.


wunder

On Jun 7, 2013, at 8:24 PM, geeky2 wrote:


can someone point me to a "custom field" tutorial.

i checked the wiki and this list - but still a little hazy on how i would 
do

this.

essentially - when the user issues a query, i want my class to interrogate 
a

string field (containing several codes - example boo, baz, bar)

and return a single integer field that maps to the string field 
(containing

the code).

example:

boo=1
baz=2
bar=3

thx
mark