date:20101119

Re: How to limit result rows by field types?

2010-11-19 Thread Peter Wang

Erick Erickson  writes:

Thanks for your reply.

I failed incorporate the IKAnalyzer[1] into solr-trunk, 
so I am now using solr 1.4.1 + field collapsing patch[2].
It works fine.

[1] http://code.google.com/p/ik-analyzer/
[2] https://issues.apache.org/jira/browse/SOLR-236
> It's already in trunk, so if you can use one of the nightly builds
> you could start using it now, see:
> https://hudson.apache.org/hudson/job/Solr-trunk/
>
> Best
> Erick
>
> On Tue, Nov 16, 2010 at 9:30 PM, Peter Wang  wrote:
>
>> Peter Wang  writes:
>>
>>
>> reply myself
>>
>> I find a PPT[1] about solr, it call such thing as "Field Collapsing" .
>>
>> It will be added to solr 1.5? unfortunately, I am using Solr 1.4
>>
>> for solr 1.4, is there other solutions for such task?
>>
>> [1] http://lucene-eurocon.org/slides/Solr-15-and-Beyond_Yonik-Seely.pdf
>>
>> > Hi, all.
>> >
>> > I am using solr running multiple indexes, by Flattening Data Into a
>> > Single Index [1].
>> >
>> > A type field in schema to stand for type of document, say it has
>> > following options: book, movie, music
>> >
>> > When query it, some types may have more result rows than others, for
>> > example, we need 3 result rows, one for each type. but it may return
>> > results only containing one type by default ranking.
>> >
>> > My question is how to limit result rows by each type, so in above case
>> >  result covers all types, one result for each type.
>> >
>> > Is there such solr/lucene query syntax or other ways can do it?
>> >
>> > currently, what i got is do a query for each type, so it may limit
>> > result rows for each type. but it must do query many times and may be
>> > very slow when i have many types.
>> >
>> > Thanks for your suggestions.
>> >
>> > --
>> > [1]
>> >
>> http://wiki.apache.org/solr/MultipleIndexes#Flattening_Data_Into_a_Single_Index
>> >
>> >
>> > -peter
>>
>>

Re: Must require quote with single word token query?

2010-11-19 Thread Yonik Seeley

On Fri, Nov 19, 2010 at 9:41 PM, Chamnap Chhorn  wrote:
> Wow, i never know this syntax before. What's that called?

I dubbed it "local params" since it adds local info to a parameter
(think extra metadata, like XML attributes on an element).

http://wiki.apache.org/solr/LocalParams

It's used mostly to invoke different query parsers, but it's also used
to add extra metadata to faceting commands too (and is required for
stuff like multi-select faceting):

http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams


-Yonik
http://www.lucidimagination.com



> On 11/19/10, Yonik Seeley  wrote:
>> On Tue, Nov 16, 2010 at 10:28 PM, Chamnap Chhorn
>>  wrote:
>>> I have one question related to single word token with dismax query. In
>>> order
>>> to be found I need to add the quote around the search query all the time.
>>> This is quite hard for me to do since it is part of full text search.
>>>
>>> Here is my solr query and field type definition (Solr 1.4):
>>>    >> positionIncrementGap="100">
>>>      
>>>        
>>>        
>>>        
>>>        >> words="stopwords.txt" enablePositionIncrements="true"/>
>>>        >> ignoreCase="true" expand="false" />
>>>        
>>>      
>>>    
>>>
>>>    >> stored="false" multiValued="true"/>
>>>
>>> With this query
>>> q=smart%20mobile&qf=keyphrase&debugQuery=on&defType=dismax,
>>> solr returns nothing. However, with quote on the search query q="smart
>>> mobile"&qf=keyphrase&debugQuery=on&defType=dismax, the result is found.
>>>
>>> Is it a must to use quote for a single word token field?
>>
>> Yes, you must currently quote tokens if they contain whitespace -
>> otherwise the query parser first breaks on whitespace before doing
>> analysis on each part separately.
>>
>> Using dismax is an odd choice if you are only querying on keyphrase though.
>> You might look at the field query parser - it is a basic single-field
>> single-value parser with no operators (hence no need to escape any
>> special characters).
>>
>> q={!field f=keyphrase}smart%20mobile
>>
>> or you can decompose it using param dereferencing (sometimes easier to
>> construct)
>>
>> q={!field f=keyphrase v=$qq}&qq=smart%20mobile
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
> --
> Sent from my mobile device
>
> Chhorn Chamnap
> http://chamnapchhorn.blogspot.com/
>

Re: Must require quote with single word token query?

2010-11-19 Thread Chamnap Chhorn

Wow, i never know this syntax before. What's that called?

On 11/19/10, Yonik Seeley  wrote:
> On Tue, Nov 16, 2010 at 10:28 PM, Chamnap Chhorn
>  wrote:
>> I have one question related to single word token with dismax query. In
>> order
>> to be found I need to add the quote around the search query all the time.
>> This is quite hard for me to do since it is part of full text search.
>>
>> Here is my solr query and field type definition (Solr 1.4):
>>    > positionIncrementGap="100">
>>      
>>        
>>        
>>        
>>        > words="stopwords.txt" enablePositionIncrements="true"/>
>>        > ignoreCase="true" expand="false" />
>>        
>>      
>>    
>>
>>    > stored="false" multiValued="true"/>
>>
>> With this query
>> q=smart%20mobile&qf=keyphrase&debugQuery=on&defType=dismax,
>> solr returns nothing. However, with quote on the search query q="smart
>> mobile"&qf=keyphrase&debugQuery=on&defType=dismax, the result is found.
>>
>> Is it a must to use quote for a single word token field?
>
> Yes, you must currently quote tokens if they contain whitespace -
> otherwise the query parser first breaks on whitespace before doing
> analysis on each part separately.
>
> Using dismax is an odd choice if you are only querying on keyphrase though.
> You might look at the field query parser - it is a basic single-field
> single-value parser with no operators (hence no need to escape any
> special characters).
>
> q={!field f=keyphrase}smart%20mobile
>
> or you can decompose it using param dereferencing (sometimes easier to
> construct)
>
> q={!field f=keyphrase v=$qq}&qq=smart%20mobile
>
> -Yonik
> http://www.lucidimagination.com
>

-- 
Sent from my mobile device

Chhorn Chamnap
http://chamnapchhorn.blogspot.com/

occasional exception

2010-11-19 Thread j...@nuatech.net

Hi,

I setup a Solr infrastructure a couple of months ago. So far the system has
worked well but I am occasionally  getting  stuck in loops where I keep
getting 500's returned for commits.

my Tomcat Catalina logs on the Solr master show the following issue:

Nov 14, 2010 2:41:46 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[http://www.rte.ie/news/2000/0428/sport.html,
http://www.rte.ie/news/1999/1019/moriarty.html,
http://www.rte.ie/news/2000/0216/sport.html,
http://www.rte.ie/news/2000/0715/explosion.html
, http://www.rte.ie/news/1999/0514/monk.html,
http://www.rte.ie/news/2001/0515/goodman.html,
http://www.rte.ie/news/2002/0415/easttimor.html,
http://www.rte.ie/news/2001/0901/u2.html, ... (8 added)
]} 0 181
Nov 14, 2010 2:41:46 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.IllegalArgumentException: Increment must be zero or
greater: -2147483648
at
org.apache.lucene.analysis.Token.setPositionIncrement(Token.java:322)
at
org.apache.lucene.analysis.TokenWrapper.setPositionIncrement(TokenWrapper.java:93)
at
org.apache.lucene.analysis.StopFilter.incrementToken(StopFilter.java:228)
at
org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:38)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:189)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:828)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:809)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2683)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2655)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:636)


I have a single RW solr master instance that has 2 indexers
updating/committing to it over HTTP. And a  third script deleting old
documents then committing over HTTP also.

I can't figure out what is wrong. A trawl through Google suggests issues
with custom tokenizers but I am using the built in ones in Solr 1.4.1. I am
running on the latest Tomcat 6. with java version "1.6.0_17"
OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-x86_64)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode).

Any help or pointers would be appreciated.

Regards,
John

-- 
_
John G. Moylan

Re: DIH full-import failure, no real error message

2010-11-19 Thread Erik Fäßler

Yes, I noticed just after sending the message.
My apologies!

Best,

Erik

Am 20.11.2010 um 00:32 schrieb Chris Hostetter :

> 
> : Subject: DIH full-import failure, no real error message
> : References: 
> : In-Reply-To: 
> 
> http://people.apache.org/~hossman/#threadhijack
> Thread Hijacking on Mailing Lists
> 
> When starting a new discussion on a mailing list, please do not reply to 
> an existing message, instead start a fresh email.  Even if you change the 
> subject line of your email, other mail headers still track which thread 
> you replied to and your question is "hidden" in that thread and gets less 
> attention.   It makes following discussions in the mailing list archives 
> particularly difficult.
> 
> 
> -Hoss

Re: Dismax - Boosting

2010-11-19 Thread Ahmet Arslan

> The below is my previous configuration which use to work
> correctly.
> 
>  class="solr.SpellCheckComponent">
>   name="queryAnalyzerFieldType">textSpell
>  
>   default
>   searchFields
>    name="spellcheckIndexDir">/solr/qa/tradedata/spellchecker
>   true
>  
> 
> 
> We use to search only in one field which is "searchFields"
> but with
> implementing dismax we are searching in different fields
> like
> 
> title^9.0 subtitle^3.0 author^2.0 desc shortdesc imprint
> category isbn13
> isbn10 format series season bisacsub award.
> 
> Do we need to modify the above configuration to include all
> the above
> fields:??? Please give me an example.

Searching and spell checking are independent. For example you can search on 10 
fields, and create suggestions from 2 fields. Spell checker accepts one field 
in its configuration. So you need to populate this field with copyField. Using 
the fields that you want to use spell checking. And type of this field should 
be textSpell in your case. You can use above config.

> 
> In the past we use to query twice to get first the
> suggestions and then we
> use to query using the first suggestion to show the data.
> 
> Is there a way that we can do it in one step?

Are you talking about queries that return 0 numFound? Re-executing the search 
like, described here http://sematext.com/products/dym-researcher/index.html

Not out-of-the-box.

Re: DIH full-import failure, no real error message

2010-11-19 Thread Chris Hostetter


: Subject: DIH full-import failure, no real error message
: References: 
: In-Reply-To: 

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.


-Hoss

Re: Per field facet limit

2010-11-19 Thread Chris Hostetter

: The wiki on facet.limit
: (http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit) says
: "This parameter can be specified on a per field basis to indicate a
: separate limit for certain fields." But it is not specified how to
: specify a specific field. How do you do this?

it's explained at the top of the Parameters section...

http://wiki.apache.org/solr/SimpleFacetParameters#Parameters

>> Note that many parameters may be overridden on a per-field basis with 
>> the  following syntax:
>>
>>* f..=
>> ...

...and then there is an example


-Hoss

Re: Need Middleware between search client and solr?

2010-11-19 Thread Dan Lynn

You might be able to skip on a front-end to solr by making extensive use 
of XSL to format the results, but there are several other arguments 
putting code in front of solr (e.g. saved searches, custom sorting, 
result-level embedded actions, etc..)


Cheers,
Dan

On 11/19/2010 01:58 PM, cyang2010 wrote:

Hi,

I am new to the lucene/solr.  I have a very general question, and hope to
hear your recommendation.

Do you need a middleware/module between your search client and solr server?
The response message is very solr specific.   Do you need to translate it to
application object model and return back to search client?   In that case, i
am thinking to have a search module in middleware server.   it will
route/decorate the search request to solr server, and after getting solr
response then package in an application object list return back to search
client.   Does it make sense?

My concern is whether it is unnecessarily add a network layer and slow down
the search speed?  But from application point of view, i see that is
necessary.   What do you think?

Thanks,


cy

Re: Need Middleware between search client and solr?

2010-11-19 Thread Gora Mohanty

On Sat, Nov 20, 2010 at 2:28 AM, cyang2010  wrote:
[...]
> Do you need a middleware/module between your search client and solr server?
> The response message is very solr specific.   Do you need to translate it to
> application object model and return back to search client?   In that case, i
> am thinking to have a search module in middleware server.   it will
> route/decorate the search request to solr server, and after getting solr
> response then package in an application object list return back to search
> client.   Does it make sense?

I believe that having a front-end to Solr is a very typical use case. As you
refer to middleware above, introducing another layer between the front-end,
and Solr search on the back-end, might or might not make sense, depending
on your requirements. If you are using a web development framework, there
are already implementations of many such middleware layers that provide an
interface to Solr in a manner more "natural" to users of the framework. Though
Solr search is usually blazingly fast, the overhead of such middleware should be
reasonable compared to the advantages it provides.

> My concern is whether it is unnecessarily add a network layer and slow down
> the search speed?  But from application point of view, i see that is
> necessary.   What do you think?
[...]

Well, presumably, one's search requirements stem from a website/application
that also provides other functionalities. In such a case, a layer over
Solr seems
almost unavoidable.

Regards,
Gora

Re: DIH full-import failure, no real error message

2010-11-19 Thread Erik Fäßler


 Hello Erick,

I guess I'm the one asking for pardon - but sure not you! It seems, 
you're first guess could already be the correct one. Disc space IS kind 
of short and I believe it could have run out; since Solr is performing a 
rollback after the failure, I didn't notice (beside the fact that this 
is one of our server machine, but apparently the wrong mount point...).


I not yet absolutely sure of this, but it would explain a lot and it 
really looks like it. So thank you for this maybe not so obvious hint :)


But you also mentioned the merging strategy. I left everything on the 
standards that come with the Solr download concerning these things.
Could it be that such a great index needs another treatment? Could you 
point me to a Wiki page or something where I get a few tips?


Thanks a lot, I will try building the index on a partition with enough 
space, perhaps that will already do it.


Best regards,

Erik

Am 16.11.2010 14:19, schrieb Erick Erickson:

Several questions. Pardon me if they're obvious, but I've spent fr
too much of my life overlooking the obvious...

1>  Is it possible you're running out of disk? 40-50G could suck up
a lot of disk, especially when merging. You may need that much again
free when a merge occurs.
2>  speaking of merging, what are your merge settings? How are you
triggering merges. See  and associated in solrconfig.xml?
3>  You might get some insight by removing the Solr indexing part, can
you spin through your parsing from beginning to end? That would
eliminate your questions about whether you're XML parsing is the
problem.


40-50G is a large index, but it's certainly within Solr's capability,
so you're not hitting any built-in limits.

My first guess would be that you're running out of disk, at least
that's the first thing I'd check next...

Best
Erick

On Tue, Nov 16, 2010 at 3:33 AM, Erik Fäßlerwrote:


  Hey all,

I'm trying to create a Solr index for the 2010 Medline-baseline (
www.pubmed.gov, over 18 million XML documents). My goal is to be able to
retrieve single XML documents by their ID. Each document comes with a unique
ID, the PubMedID. So my schema (important portions) looks like this:





pmid
pmid

pmid holds the ID, data hold the creation date; xml holds the whole XML
document (mostly below 5kb). I used the DataImporter to do this. I had to
write some classes (DataSource, EntityProcessor, DateFormatter) myself, so
theoretically, the error could lie there.

What happens is that indexing just looks fine at the beginning. Memory
usage is quite below the maximum (max of 20g, usage of below 5g, most of the
time around 3g). It goes several hours in this manner until it suddenly
stopps. I tried this a few times with minor tweaks, non of which made any
difference. The last time such a crash occurred, over 16.5 million documents
already had been indexed (argh, so close...). It never stops at the same
document and trying to index the documents, where the error occurred, just
runs fine. Index size on disc was between 40g and 50g the last time I had a
look.

This is the log from beginning to end:

(I decided to just attach the log for the sake of readability ;) ).

As you can see, Solr's error message is not quite complete. There are no
closing brackets. The document is cut in half on this message and not even
the error message itself is complete: The 'D' of
(D)ataImporter.runCmd(DataImporter.java:389) right after the document text
is missing.

I have one thought concerning this: I get the input documents as an
InputStream which I read buffer-wise (at most 1000bytes per read() call). I
need to deliver the documents in one large byte-Array to the XML parser I
use (VTD XML).
But I don't only get the individual small XML documents but always one
larger XML blob with exactly 30,000 of these documents. I use a self-written
EntityProcessor to extract the single documents from the larger blob. These
blobs have a size of about 50 to 150mb. So what I do is to read these large
blobs in 1000bytes steps and store each byte array in an ArrayList.
Afterwards, I create the ultimate byte[] and do System.arraycopy from the
ArrayList into the byte[].
I tested this and it looks fine to me. And how I said, indexing the
documents where the error occurred just works fine (that is, indexing the
whole blob containing the single document). I just mention this because it
kind of looks like there is this cut in the document and the missing 'D'
reminds me of char-encoding errors. But I don't know for real, opening the
error log in vi doesn't show any broken characters (the last time I had such
problems, vi could identify the characters in question, other editors just
wouldn't show them).

Further ideas from my side: Is the index too big? I think I read something
about a large index would be something around 10million documents, I aim to
approximately double this number. But would this cause such an error? In the
end: What exactly IS the error?

Sorry for the lot of text, just tryin

Need Middleware between search client and solr?

2010-11-19 Thread cyang2010


Hi,

I am new to the lucene/solr.  I have a very general question, and hope to
hear your recommendation.

Do you need a middleware/module between your search client and solr server?  
The response message is very solr specific.   Do you need to translate it to
application object model and return back to search client?   In that case, i
am thinking to have a search module in middleware server.   it will
route/decorate the search request to solr server, and after getting solr
response then package in an application object list return back to search
client.   Does it make sense?   

My concern is whether it is unnecessarily add a network layer and slow down
the search speed?  But from application point of view, i see that is
necessary.   What do you think?

Thanks,


cy
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-Middleware-between-search-client-and-solr-tp1932912p1932912.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dismax - Boosting

2010-11-19 Thread Solr User

Hi Ahmet,

The below is my previous configuration which use to work correctly.

 textSpell

  default
  searchFields
  /solr/qa/tradedata/spellchecker
  true

We use to search only in one field which is "searchFields" but with
implementing dismax we are searching in different fields like

title^9.0 subtitle^3.0 author^2.0 desc shortdesc imprint category isbn13
isbn10 format series season bisacsub award.

Do we need to modify the above configuration to include all the above
fields:??? Please give me an example.

In the past we use to query twice to get first the suggestions and then we
use to query using the first suggestion to show the data.

Is there a way that we can do it in one step?

Thanks,

Murali

On Wed, Nov 17, 2010 at 7:00 PM, Ahmet Arslan  wrote:

>
> > 2. How to use spell checker request handler along with
> > dismax?
>
> Just append this at the end of dismax request handler definition:
>
> 
>   spellcheck
> 
>
> 
>
>
>
>

Re: How to Transmit and Append Indexes

2010-11-19 Thread Gora Mohanty

On Sat, Nov 20, 2010 at 12:39 AM, Bing Li  wrote:
> Hi, Gora,
>
> No, I really wonder if Solr is based on Hadoop?

As far as I know, no it it isn't.

> Hadoop is efficient when using on search engines since it is suitable to the
> write-once-read-many model. After reading your emails, it looks like Solr's
> distributed file system does the same thing. Both of them are good for
> searching large indexes in a large scale distributed environment, right?
[...]

Are you talking about distributed Solr search, such as Solr on the Cloud:
http://wiki.apache.org/solr/SolrCloud ? Someone more familiar with Solr
can correct me if I am wrong, but I do not believe that this does a
map/reduce like Hadoop provides.

Unless I am even more confused than usual, Hadoop provides a distributed
file-system (HDFS), and a framework for doing map/reduce. This is a generic
framework and no built-in search capabilities are available. People have tried
to use Solr/Lucene on HDFS, but am not very sure as to whether anyone has
used map/reduce techniques on search, indexing, or other items with Solr/Lucene,
and Hadoop.

Regards,
Gora

Re: How to Transmit and Append Indexes

2010-11-19 Thread Bing Li

Hi, Gora,

No, I really wonder if Solr is based on Hadoop?

Hadoop is efficient when using on search engines since it is suitable to the
write-once-read-many model. After reading your emails, it looks like Solr's
distributed file system does the same thing. Both of them are good for
searching large indexes in a large scale distributed environment, right?

Thanks!
Bing

On Sat, Nov 20, 2010 at 3:01 AM, Gora Mohanty  wrote:

> On Sat, Nov 20, 2010 at 12:05 AM, Bing Li  wrote:
> > Dear Erick,
> >
> > Thanks so much for your help! I am new in Solr. So I have no idea about
> the
> > version.
>
> The solr/admin/registry.jsp URL on your local Solr installation should show
> you the version at the top.
>
> > But I wonder what are the differences between Solr and Hadoop? It seems
> that
> > Solr has done the same as what Hadoop promises.
> [...]
>
> Er, what? Solr and Hadoop are entirely different applications. Did you
> mean Lucene or Nutch, instead of Hadoop?
>
> Regards,
> Gora
>

Re: How to Transmit and Append Indexes

2010-11-19 Thread Gora Mohanty

On Sat, Nov 20, 2010 at 12:05 AM, Bing Li  wrote:
> Dear Erick,
>
> Thanks so much for your help! I am new in Solr. So I have no idea about the
> version.

The solr/admin/registry.jsp URL on your local Solr installation should show
you the version at the top.

> But I wonder what are the differences between Solr and Hadoop? It seems that
> Solr has done the same as what Hadoop promises.
[...]

Er, what? Solr and Hadoop are entirely different applications. Did you
mean Lucene or Nutch, instead of Hadoop?

Regards,
Gora

Re: Meaning of avgTimePerRequest & avgRequestsPerSecond in SOLR stats page

2010-11-19 Thread Shanmugavel SRD


Thanks Erick.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Meaning-of-avgTimePerRequest-avgRequestsPerSecond-in-SOLR-stats-page-tp1922692p1932275.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to Transmit and Append Indexes

2010-11-19 Thread Bing Li

Dear Erick,

Thanks so much for your help! I am new in Solr. So I have no idea about the
version.

But I wonder what are the differences between Solr and Hadoop? It seems that
Solr has done the same as what Hadoop promises.

Best,
Bing

On Sat, Nov 20, 2010 at 2:28 AM, Erick Erickson wrote:

> You haven't said what version of Solr you're using, but you're
> asking about replication, which is built-in.
> See: http://wiki.apache.org/solr/SolrReplication
>
> And no, your slave doesn't block while the update is happening,
> and it automatically switches to the updated index upon
> successful replication.
>
> Older versions of Solr used rsynch & etc.
>
> Best
> Erick
>
> On Fri, Nov 19, 2010 at 10:52 AM, Bing Li  wrote:
>
>> Hi, all,
>>
>> I am working on a distributed searching system. Now I have one server
>> only.
>> It has to crawl pages from the Web, generate indexes locally and respond
>> users' queries. I think this is too busy for it to work smoothly.
>>
>> I plan to use two servers at at least. The jobs to crawl pages and
>> generate
>> indexes are done by one of them. After that, the new available indexes
>> should be transmitted to anther one which is responsible for responding
>> users' queries. From users' point of view, this system must be fast.
>> However, I don't know how I can get the additional indexes which I can
>> transmit. After transmission, how to append them to the old indexes? Does
>> the appending block searching?
>>
>> Thanks so much for your help!
>>
>> Bing Li
>>
>
>

Re: How to Transmit and Append Indexes

2010-11-19 Thread Erick Erickson

You haven't said what version of Solr you're using, but you're
asking about replication, which is built-in.
See: http://wiki.apache.org/solr/SolrReplication

And no, your slave doesn't block while the update is happening,
and it automatically switches to the updated index upon
successful replication.

Older versions of Solr used rsynch & etc.

Best
Erick

On Fri, Nov 19, 2010 at 10:52 AM, Bing Li  wrote:

> Hi, all,
>
> I am working on a distributed searching system. Now I have one server only.
> It has to crawl pages from the Web, generate indexes locally and respond
> users' queries. I think this is too busy for it to work smoothly.
>
> I plan to use two servers at at least. The jobs to crawl pages and generate
> indexes are done by one of them. After that, the new available indexes
> should be transmitted to anther one which is responsible for responding
> users' queries. From users' point of view, this system must be fast.
> However, I don't know how I can get the additional indexes which I can
> transmit. After transmission, how to append them to the old indexes? Does
> the appending block searching?
>
> Thanks so much for your help!
>
> Bing Li
>

Re: Is it fine to transmit indexes in this way?

2010-11-19 Thread Gora Mohanty

On Fri, Nov 19, 2010 at 11:39 PM, Bing Li  wrote:
[...]
> When updates are replicated to slave servers, it is supposed that the
> updates are merged with the existing indexes and readings on them can be
> done concurrently. If so, the queries must be responded instantly. That's
> what I mean "appending". Does it happen in Solr?
[...]

If you look at the last point in the section "How does the slave replicate?"
on the replication page, http://wiki.apache.org/solr/SolrReplication , you
will note that a commit is issued on the slave Solr server *after* replication
finishes, so that new/updated documents become available for querying
only then.

Do not personally have much experience with this, but if you need real-time
search feature like you seem to be describing above, I would look at
http://wiki.apache.org/solr/NearRealtimeSearch
http://wiki.apache.org/solr/NearRealtimeSearchTuning
and recent threads on the subject on this mailing list.

Regards,
Gora

Re: Is it fine to transmit indexes in this way?

2010-11-19 Thread Bing Li

Thanks so much, Gora!

What do you mean by appending? If you mean adding to an existing index
(on reindexing, this would normally mean an update for an existing Solr
document ID, and a create for a new Solr document ID), the best way
probably is not to delete the index on the master server (what you call
machine A). Once the indexing is completed, a commit ensures that new
documents show up for any subsequent queries.

When updates are replicated to slave servers, it is supposed that the
updates are merged with the existing indexes and readings on them can be
done concurrently. If so, the queries must be responded instantly. That's
what I mean "appending". Does it happen in Solr?

Best,
Bing

On Sat, Nov 20, 2010 at 1:58 AM, Gora Mohanty  wrote:

> On Fri, Nov 19, 2010 at 10:53 PM, Bing Li  wrote:
> > Hi, all,
> >
> > Since I didn't find that Lucene presents updated indexes to us, may I
> > transmit indexes in the following way?
> >
> > 1) One indexing machine, A, is busy with generating indexes;
> >
> > 2) After a certain time, the indexing process is terminated;
> >
> > 3) Then, the new indexes are transmitted to machines which serve users'
> > queries;
>
> Just replied to a similar question in another thread. The best way
> is probably to use Solr replication:
> http://wiki.apache.org/solr/SolrReplication
>
> You can set up replication to happen automatically upon commit on the
> master server (where the new index was made). As a commit should
> have been made when indexing is complete on the master server, this
> will then ensure that a new index is replicated on the slave server.
>
> > 4) It is possible that some index files have the same names. So the
> > conflicting files should be renamed;
>
> Replication will handle this for you.
>
> > 5) After the transmission is done, the transmitted indexes are removed
> from
> > A.
> >
> > 6) After the removal, the indexing process is started again on A.
> [...]
>
> These two items you have to do manually, i.e., delete all documents
> on A, and restart the indexing.
>
>
> > And, may I append them to
> existing indexes?
> > Does the appending affect the querying?
> [...]
>
> What do you mean by appending? If you mean adding to an existing index
> (on reindexing, this would normally mean an update for an existing Solr
> document ID, and a create for a new Solr document ID), the best way
> probably is not to delete the index on the master server (what you call
> machine A). Once the indexing is completed, a commit ensures that new
> documents show up for any subsequent queries.
>

> Regards,
> Gora
>

Re: Is it fine to transmit indexes in this way?

2010-11-19 Thread Gora Mohanty

On Fri, Nov 19, 2010 at 10:53 PM, Bing Li  wrote:
> Hi, all,
>
> Since I didn't find that Lucene presents updated indexes to us, may I
> transmit indexes in the following way?
>
> 1) One indexing machine, A, is busy with generating indexes;
>
> 2) After a certain time, the indexing process is terminated;
>
> 3) Then, the new indexes are transmitted to machines which serve users'
> queries;

Just replied to a similar question in another thread. The best way
is probably to use Solr replication:
http://wiki.apache.org/solr/SolrReplication

You can set up replication to happen automatically upon commit on the
master server (where the new index was made). As a commit should
have been made when indexing is complete on the master server, this
will then ensure that a new index is replicated on the slave server.

> 4) It is possible that some index files have the same names. So the
> conflicting files should be renamed;

Replication will handle this for you.

> 5) After the transmission is done, the transmitted indexes are removed from
> A.
>
> 6) After the removal, the indexing process is started again on A.
[...]

These two items you have to do manually, i.e., delete all documents
on A, and restart the indexing.

> And, may I append them to 
> existing indexes?
> Does the appending affect the querying?
[...]

What do you mean by appending? If you mean adding to an existing index
(on reindexing, this would normally mean an update for an existing Solr
document ID, and a create for a new Solr document ID), the best way
probably is not to delete the index on the master server (what you call
machine A). Once the indexing is completed, a commit ensures that new
documents show up for any subsequent queries.

Regards,
Gora

Re: Export Index Data.

2010-11-19 Thread Gora Mohanty

On Fri, Nov 19, 2010 at 10:33 PM, Anderson vasconcelos
 wrote:
> Hi
> Is possible to export one set of documents indexed in one solr server for do
> a sincronization with other solr server?

Yes. The easiest way probably is to set up replication:
http://wiki.apache.org/solr/SolrReplication

However, if the Solr schema and configuration are the same it also
suffices to copy the Solr data directory from one server to another.
You will need to restart Solr on the new server.

Regards,
Gora

Is it fine to transmit indexes in this way?

2010-11-19 Thread Bing Li

Hi, all,

Since I didn't find that Lucene presents updated indexes to us, may I
transmit indexes in the following way?

1) One indexing machine, A, is busy with generating indexes;

2) After a certain time, the indexing process is terminated;

3) Then, the new indexes are transmitted to machines which serve users'
queries;

4) It is possible that some index files have the same names. So the
conflicting files should be renamed;

5) After the transmission is done, the transmitted indexes are removed from
A.

6) After the removal, the indexing process is started again on A.

The reason I am trying to do that is to load balancing the search load. One
machine is responsible for generating indexes and the others are responsible
for responding queries.

If the above approaches do not work, may I see the updates of indexes in
Lucene? May I transmit them? And, may I append them to existing indexes?
Does the appending affect the querying?

I am learning Solr. But it seems that Solr does that for me. However, I have
to set up Tomcat to use Solr. I think it is a little bit heavy.

Thanks!
Bing Li

Export Index Data.

2010-11-19 Thread Anderson vasconcelos

Hi
Is possible to export one set of documents indexed in one solr server for do
a sincronization with other solr server?

Thank's

RE: DataImportHandlerException for custom DIH Transformer

2010-11-19 Thread Peter Sturge

Hi,

This problem is usually because your custom Transformer is in the
solr/lib folder, when it needs to be in the webapps .war file (under
WEB-INF/lib of course).
Place your custom Transformer in a .jar in your .war and you should be
good to go.

Thanks,
Peter



Subject:
RE: DataImportHandlerException for custom DIH Transformer
From:
Vladimir Sutskever 
Date:
1969-12-31 19:00

I am experiencing a similar situation?

Any comments?


-Original Message-
From: Shashikant Kore [mailto:shashik...@gmail.com]
Sent: Wednesday, September 08, 2010 2:54 AM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandlerException for custom DIH Transformer

Resurrecting an old thread.

I faced exact problem as Tommy and the jar was in {solr.home}/lib as Noble
had suggested.

My custom transformer overrides following method as per the specification of
Transformer class.

public Object transformRow(Map row, Context
context);

But, in the code (EntityProcessorWrapper.java), I see the following line.

  final Method meth = clazz.getMethod(TRANSFORM_ROW, Map.class);

This doesn't match the method signature in Transformer. I think this should
be

  final Method meth = clazz.getMethod(TRANSFORM_ROW, Map.class,
Context.class);

I have verified that adding a method transformRow(Map row)
works.

Am I missing something?

--shashi

2010/2/8 Noble Paul നോബിള്‍ नोब्ळ् 

On Mon, Feb 8, 2010 at 9:13 AM, Tommy Chheng  wrote:

I'm having trouble making a custom DIH transformer in solr
1.4. I compiled the "General TrimTransformer" into a jar. (just
copy/paste

sample

code from http://wiki.apache.org/solr/DIHCustomTransformer) I
placed the jar along with the dataimporthandler jar in solr/lib (same
directory as the jetty jar)

do not keep in solr/lib it wont work. keep it in {solr.home}/lib

Then I added to my DIH data-config.xml file:
transformer="DateFormatTransformer, RegexTransformer,
com.chheng.dih.transformers.TrimTransformer" Now I get this exception
when I try running the import.
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodException:
com.chheng.dih.transformers.TrimTransformer.transformRow(java.util.Map)
at


org.apache.solr.handler.dataimport.EntityProcessorWrapper.loadTransformers(EntityProcessorWrapper.java:120)

I noticed the exception lists
TrimTransformer.transformRow(java.util.Map) but the abstract
Transformer class defines a two parameter method:
transformRow(Map row, Context context)? -- Tommy
Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng
http://tommy.chheng.com

-- - Noble
Paul | Systems Architect| AOL | http://aol.com

Re: String field with lower case filter

2010-11-19 Thread Ahmet Arslan


> But with above configuration i am not getting any
> results.Can anybody has idea.

class="solr.StrField" should be replaced with class="solr.TextField"

Re: Master/Slave High CPU Usage

2010-11-19 Thread Ofer Fort

That sounds like a great option, and it will also free some storage space on
that sever (now each index is about 130GB).
Other than the lock policy (we use single), any other things to look out to?
Thanks


ב-19 בנוב 2010, בשעה 05:30, Lance Norskog  כתב/ה:

If they are on the same server, you do not need to replicate.

If you only do queries, the query server can use the same index
directory as the master. Works quite well. Both have to have the same
LockPolicy in solrconfig.xml. For security reasons, I would run the
query server as a different user who has read-only access to the
index; that way it cannot touch the index.

On Wed, Nov 17, 2010 at 11:28 PM, Ofer Fort  wrote:

anybody?


On Wed, Nov 17, 2010 at 12:09 PM, Ofer Fort  wrote:


Hi, I'm working with Erez,

we experienced this again, and this time the slave index folder didn't
contain the index.XXX folder, only one index folder.

if we shutdown the slave, the CPU on the master was normal, as soon as we
started the slave again, the CPU went up to 100% again.

thanks for any help

ofer


On Wed, Nov 17, 2010 at 11:15 AM, Erez Zarum  wrote:


Hi all,

We've been seeing this for the second time already.

I have a solr (1.4.1) master and a slave. both are located on the same
machine (16GB RAM, 4GB allocated to the slave and 3GB to the master)

All our updates are going towards the master, and all the queries are
towards the slave.

Once in a while the slave gets OutOfMemoryError. This is not the big problem
(i have a about 100M documents)

The problem is that from that moment the CPU of the slave AND the master is
almost 100%.

If i shutdown the slave, the CPU of the master drops.

If i start the slave again, the CPU is 100% again.

I have the replication set on commit and startup.

I see that in the data folder contains three index folders: index,
index.XXXYYY and  index.XXXYYY.ZZZ


The only way i was able to get pass it (worked two times already), is to
shutdown the two servers, and to copy all the index of the master to the
slave, and start them again.

>From that moment and on, they continue to work and replicate with a very
reasonable CPU usage.


Our guess is that it failed to replicate due to the OOM and since then tries
to do a full replication again and again?

but why is the CPU of the master so high?






-- 
Lance Norskog
goks...@gmail.com

Re: Must require quote with single word token query?

2010-11-19 Thread Yonik Seeley

On Tue, Nov 16, 2010 at 10:28 PM, Chamnap Chhorn
 wrote:
> I have one question related to single word token with dismax query. In order
> to be found I need to add the quote around the search query all the time.
> This is quite hard for me to do since it is part of full text search.
>
> Here is my solr query and field type definition (Solr 1.4):
>     positionIncrementGap="100">
>      
>        
>        
>        
>         words="stopwords.txt" enablePositionIncrements="true"/>
>         ignoreCase="true" expand="false" />
>        
>      
>    
>
>     stored="false" multiValued="true"/>
>
> With this query q=smart%20mobile&qf=keyphrase&debugQuery=on&defType=dismax,
> solr returns nothing. However, with quote on the search query q="smart
> mobile"&qf=keyphrase&debugQuery=on&defType=dismax, the result is found.
>
> Is it a must to use quote for a single word token field?

Yes, you must currently quote tokens if they contain whitespace -
otherwise the query parser first breaks on whitespace before doing
analysis on each part separately.

Using dismax is an odd choice if you are only querying on keyphrase though.
You might look at the field query parser - it is a basic single-field
single-value parser with no operators (hence no need to escape any
special characters).

q={!field f=keyphrase}smart%20mobile

or you can decompose it using param dereferencing (sometimes easier to
construct)

q={!field f=keyphrase v=$qq}&qq=smart%20mobile

-Yonik
http://www.lucidimagination.com

How to Transmit and Append Indexes

2010-11-19 Thread Bing Li

Hi, all,

I am working on a distributed searching system. Now I have one server only.
It has to crawl pages from the Web, generate indexes locally and respond
users' queries. I think this is too busy for it to work smoothly.

I plan to use two servers at at least. The jobs to crawl pages and generate
indexes are done by one of them. After that, the new available indexes
should be transmitted to anther one which is responsible for responding
users' queries. From users' point of view, this system must be fast.
However, I don't know how I can get the additional indexes which I can
transmit. After transmission, how to append them to the old indexes? Does
the appending block searching?

Thanks so much for your help!

Bing Li

Use of "key" in facets

2010-11-19 Thread Tim Jones

Hi,
  I'm curious about the use of the "key" option with regard to facets.  I've
been treating it as a way to alias fields and queries in order to simplify
processing of result sets and generally make them more readable.  However it
seems that the alias is not respected when setting mincount, sort, etc
options for a particular facet.  The following example works fine with
1.4.1.

rows=0&q=*:*+NOT+customers.blocked:1&facet=true&f.customers_name.facet.mincount=2&facet.field=customers_name


  2


This version also works fine.  Note the addition of the key but keep in mind
that the original field name is still being used in the
f.customer_name.facet.mincount parameter.

rows=0&q=*:*+NOT+customers.blocked:1&facet=true&f.customers_name.facet.mincount=2&facet.field={!key=alt_name}customers_name


  2


This version does not work.  Now the key is being used in the facet
parameter (e.g. f.alt_name.facet.mincount).

rows=0&q=*:*+NOT+customers.blocked:1&facet=true&f.alt_name.facet.mincount=2&facet.field={!key=alt_name}customers_name


  2
  1
  0


What is the intended behavior of key in this case?  If it's really intended
to be an alias then it seems like it should work in these situations.  Am I
misunderstanding the meaning of key?

Thanks,
-Tim

String field with lower case filter

2010-11-19 Thread sivaprasad


Hi,

I am using a string filed with below configuration.

 
   
 
 
   
 

One of the filed is using the fields type as "cat_string". I am using this
as facet and i am searching on that field .While searching i need case
insensitive search.

Let us say cat:"Games" or cat:"games" should give same results.

But with above configuration i am not getting any results.Can anybody has
idea.

Regards,
Siva
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/String-field-with-lower-case-filter-tp1930941p1930941.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Issue with relevancy

2010-11-19 Thread Grant Ingersoll


On Nov 19, 2010, at 7:36 AM, sivaprasad wrote:

> 
> Hi,
> 
> I configured search request handler as shown below.
> 
> 
>
> 
>   explicit
> 
> 
> I am submitting the below query for search.
> 
> http://localhost:8680/solr/catalogSearch/select?facet=true&spellcheck=true&indent=on&omitHeader=true&stats.field=sal_amt&stats.field=volume&stats=true&wt=xml&q=prod_n%3ADesktop+Computer+OR+manf_mdl_nbr%3ADesktop+Computer+OR+upc%3ADesktop+Computer+OR+Brands%3ADesktop+Computer&start=0&rows=60&fl=*,score&debugQuery=on
> 
> I am getting the below results ,But for the first doc the score is higher
> than second doc, Even though the prod_n only has "Computers" word.

It looks like the first result is shorter, so the length norm is higher which 
is why it appears higher in the result list.  This value is captured in the 
fieldNorm in the explains below.

More below...


> 
> 1)
> 
> 1.6884389
> GN Netcom
> Headsets & Microphones
> 79634
> 
> http://image.shopzilla.com/resize?sq=60&uid=1640146921
> 
> 482
> 1640146921
> Computer Headset
> 17.95
> EXTERNAL
> 
> 
> 2)
> 
> 1.4326878
> Desktop Computers
> 1565338
> 
> http://image.shopzilla.com/resize?sq=60&uid=1983384776
> 
> 461
> 1983384776
> Rain Computers ION 6-Core DAW Computer
> 1799.0
> EXTERNAL
> 
> 
> I want to push down the first doc to second.H
> 
> 
> The debug query analysis is given below.
> 
> 
> 
> 
> prod_n:Desktop Computer OR manf_mdl_nbr:Desktop Computer OR upc:Desktop
> Computer OR Brands:Desktop Computer
> 
> 
> 
> prod_n:Desktop Computer OR manf_mdl_nbr:Desktop Computer OR upc:Desktop
> Computer OR Brands:Desktop Computer
> 
> 
> 
> +prod_n:comput prod_n:comput manf_mdl_nbr:comput prod_n:comput upc:Desktop
> prod_n:comput Brands:Desktop +prod_n:comput
> 
> 
> 
> +prod_n:comput prod_n:comput manf_mdl_nbr:comput prod_n:comput upc:Desktop
> prod_n:comput Brands:Desktop +prod_n:comput
> 
> 
> 
> 
> 
> 
> 1.6884388 = (MATCH) product of:
>  2.701502 = (MATCH) sum of:
>0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
>  0.18838727 = queryWeight(prod_n:comput), product of:
>4.5888486 = idf(docFreq=3518, maxDocs=127361)
>0.041053277 = queryNorm
>  2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
>1.0 = tf(termFreq(prod_n:comput)=1)
>4.5888486 = idf(docFreq=3518, maxDocs=127361)
>0.625 = fieldNorm(field=prod_n, doc=35844)
>0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
>  0.18838727 = queryWeight(prod_n:comput), product of:
>4.5888486 = idf(docFreq=3518, maxDocs=127361)
>0.041053277 = queryNorm
>  2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
>1.0 = tf(termFreq(prod_n:comput)=1)
>4.5888486 = idf(docFreq=3518, maxDocs=127361)
>0.625 = fieldNorm(field=prod_n, doc=35844)
>0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
>  0.18838727 = queryWeight(prod_n:comput), product of:
>4.5888486 = idf(docFreq=3518, maxDocs=127361)
>0.041053277 = queryNorm
>  2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
>1.0 = tf(termFreq(prod_n:comput)=1)
>4.5888486 = idf(docFreq=3518, maxDocs=127361)
>0.625 = fieldNorm(field=prod_n, doc=35844)
>0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
>  0.18838727 = queryWeight(prod_n:comput), product of:
>4.5888486 = idf(docFreq=3518, maxDocs=127361)
>0.041053277 = queryNorm
>  2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
>1.0 = tf(termFreq(prod_n:comput)=1)
>4.5888486 = idf(docFreq=3518, maxDocs=127361)
>0.625 = fieldNorm(field=prod_n, doc=35844)
>0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
>  0.18838727 = queryWeight(prod_n:comput), product of:
>4.5888486 = idf(docFreq=3518, maxDocs=127361)
>0.041053277 = queryNorm
>  2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
>1.0 = tf(termFreq(prod_n:comput)=1)
>4.5888486 = idf(docFreq=3518, maxDocs=127361)
>0.625 = fieldNorm(field=prod_n, doc=35844)
>  0.625 = coord(5/8)
> 
> 
> 
> 
> 1.4326876 = (MATCH) product of:
>  2.2923002 = (MATCH) sum of:
>0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
>  0.18838727 = queryWeight(prod_n:comput), product of:
>4.5888486 = idf(docFreq=3518, maxDocs=127361)
>0.041053277 = queryNorm
>  2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:
>1.4142135 = tf(termFreq(prod_n:comput)=2)
>4.5888486 = idf(docFreq=3518, maxDocs=127361)
>0.375 = fieldNorm(field=prod_n, doc=57069)
>0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
>  0.18838727 = queryWeight(prod_n:comput), product of:
>4.5888486 = idf(docFreq=3518, maxDocs=127361)
>0.041053277 = queryNorm
>  2.4336042 = (MATCH) fieldWeight(prod_

Re: Dismax is failing with json response writer

2010-11-19 Thread sivaprasad


The issue is solved.I replaced the solr core jar.

Thanks Erick
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-is-failing-with-json-response-writer-tp1922170p1930382.html
Sent from the Solr - User mailing list archive at Nabble.com.

Issue with relevancy

2010-11-19 Thread sivaprasad


Hi,

I configured search request handler as shown below.



 
   explicit


I am submitting the below query for search.

http://localhost:8680/solr/catalogSearch/select?facet=true&spellcheck=true&indent=on&omitHeader=true&stats.field=sal_amt&stats.field=volume&stats=true&wt=xml&q=prod_n%3ADesktop+Computer+OR+manf_mdl_nbr%3ADesktop+Computer+OR+upc%3ADesktop+Computer+OR+Brands%3ADesktop+Computer&start=0&rows=60&fl=*,score&debugQuery=on

I am getting the below results ,But for the first doc the score is higher
than second doc, Even though the prod_n only has "Computers" word.

1)

1.6884389
GN Netcom
Headsets & Microphones
79634

http://image.shopzilla.com/resize?sq=60&uid=1640146921

482
1640146921
Computer Headset
17.95
EXTERNAL


2)

1.4326878
Desktop Computers
1565338

http://image.shopzilla.com/resize?sq=60&uid=1983384776

461
1983384776
Rain Computers ION 6-Core DAW Computer
1799.0
EXTERNAL


I want to push down the first doc to second.H


The debug query analysis is given below.




prod_n:Desktop Computer OR manf_mdl_nbr:Desktop Computer OR upc:Desktop
Computer OR Brands:Desktop Computer



prod_n:Desktop Computer OR manf_mdl_nbr:Desktop Computer OR upc:Desktop
Computer OR Brands:Desktop Computer



+prod_n:comput prod_n:comput manf_mdl_nbr:comput prod_n:comput upc:Desktop
prod_n:comput Brands:Desktop +prod_n:comput



+prod_n:comput prod_n:comput manf_mdl_nbr:comput prod_n:comput upc:Desktop
prod_n:comput Brands:Desktop +prod_n:comput






1.6884388 = (MATCH) product of:
  2.701502 = (MATCH) sum of:
0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
  0.18838727 = queryWeight(prod_n:comput), product of:
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.041053277 = queryNorm
  2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
1.0 = tf(termFreq(prod_n:comput)=1)
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.625 = fieldNorm(field=prod_n, doc=35844)
0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
  0.18838727 = queryWeight(prod_n:comput), product of:
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.041053277 = queryNorm
  2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
1.0 = tf(termFreq(prod_n:comput)=1)
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.625 = fieldNorm(field=prod_n, doc=35844)
0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
  0.18838727 = queryWeight(prod_n:comput), product of:
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.041053277 = queryNorm
  2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
1.0 = tf(termFreq(prod_n:comput)=1)
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.625 = fieldNorm(field=prod_n, doc=35844)
0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
  0.18838727 = queryWeight(prod_n:comput), product of:
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.041053277 = queryNorm
  2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
1.0 = tf(termFreq(prod_n:comput)=1)
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.625 = fieldNorm(field=prod_n, doc=35844)
0.5403004 = (MATCH) weight(prod_n:comput in 35844), product of:
  0.18838727 = queryWeight(prod_n:comput), product of:
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.041053277 = queryNorm
  2.8680303 = (MATCH) fieldWeight(prod_n:comput in 35844), product of:
1.0 = tf(termFreq(prod_n:comput)=1)
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.625 = fieldNorm(field=prod_n, doc=35844)
  0.625 = coord(5/8)




1.4326876 = (MATCH) product of:
  2.2923002 = (MATCH) sum of:
0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
  0.18838727 = queryWeight(prod_n:comput), product of:
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.041053277 = queryNorm
  2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:
1.4142135 = tf(termFreq(prod_n:comput)=2)
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.375 = fieldNorm(field=prod_n, doc=57069)
0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
  0.18838727 = queryWeight(prod_n:comput), product of:
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.041053277 = queryNorm
  2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:
1.4142135 = tf(termFreq(prod_n:comput)=2)
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.375 = fieldNorm(field=prod_n, doc=57069)
0.45846006 = (MATCH) weight(prod_n:comput in 57069), product of:
  0.18838727 = queryWeight(prod_n:comput), product of:
4.5888486 = idf(docFreq=3518, maxDocs=127361)
0.041053277 = queryNorm
  2.4336042 = (MATCH) fieldWeight(prod_n:comput in 57069), product of:

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-19 Thread Peter Karich


 Hi,

the final solution is explained here in context:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201011.mbox/%3caanlktimatgvplph_mgfbsughdoedc8tc2brrwxhid...@mail.gmail.com%3e

"

/If you are using Solr branch_3x or trunk, you can turn this off, by
setting autoGeneratePhraseQueries to false in the fieldType.

By enabling this option, phrase queries are only created by the
queryparser when you enclose stuff in double quotes.

If you are using an older version of solr such as 1.4.x, then you can
only hack it, by adding a PositionFilterFactory to the end of your
query analyzer.
The downside to that approach (unfortunately the only approach, for
older versions) is that it completely disables phrasequeries across
the board for that field type./

"
So, it is not a bug of wdf.
Thanks to Robert!

Regards,
Peter.


 Hi,

I am going crazy but which config is necessary to include the missing 
doc 2?

I have:
doc1 tw:aBc
doc2 tw:abc

Now a query "aBc" returns only doc 1 although when I try doc2 from 
admin/analysis.jsp

then the term text 'abc' of the index gets highlighted as intended.
I even indexed a simple example (no stopwords, no protwords, no 
synonyms) via* and
tried this with the normal and dismax handler but I cannot make it 
working :-/


What have I misunderstood?

Regards,
Peter.





generateWordParts="1" generateNumberParts="1" 
catenateAll="0" preserveOriginal="1"/>


protected="protwords.txt"/>




ignoreCase="true" expand="true"/>
words="stopwords.txt" enablePositionIncrements="true" />

generateWordParts="1" generateNumberParts="1" 
catenateAll="0" preserveOriginal="1"/>


protected="protwords.txt"/>



--


*
books.csv:

id,tw
1,aBc
2,abc

curl http://localhost:8983/solr/update/csv?commit=true --data-binary 
@books.csv -H 'Content-type:text/plain; charset=utf-8'

Re: simple production set up

2010-11-19 Thread Markus Jelsma

Please stay on the list.

Anyway, it's a matter of not exposing certain request handlers to the public. 
If you have a master/slave set up, you can remove the update handlers from 
your public facing slave (or hide it behind HTTP auth in your proxy). The same 
goes for other defined request handlers.

Essentially, you must know all about your defined request handlers in order to 
know whether they are secure or not.

Cheers,

On Friday 19 November 2010 09:15:42 lee carroll wrote:
> Hi thanks for the response
> So if I follow what you are saying for a public facing index the standard
> pattern is to run behind a reverse proxy providing security (and caching?)
> Are their any docs on this? Or example deployment diagrams / config. Thanks
> lee c
> 
> On 18 Nov 2010 23:14, "Markus Jelsma"  wrote:
> > Hi,
> > 
> > It's a common practice not to use Solr as a frontend. Almost all deployed
> > instances live in the backend near the database servers. And if Solr is
> 
> being
> 
> > put to the front, it's still being secured by a proxy.
> > 
> > Setting up staging and production instances depend on your need. If the
> 
> load
> 
> > is small, you can run two Solr cores [1] on the same instance and if the
> 
> load
> 
> > is high you'd just separate them, the same goes for development and test
> > instances.
> > 
> > [1]: http://wiki.apache.org/solr/CoreAdmin
> > 
> > Cheers,
> > 
> >> Hi I'm pretty new to SOLR and interested in getting an idea about a
> 
> simple
> 
> >> standard way of setting up a production SOLR service. I have read the
> 
> FAQs
> 
> >> and the wiki around SOLR security and performance but have not found
> >> much on a best practice architecture. I'm particularly interested in
> >> best practices around DOS prevention, securing the SOLR web app and
> >> setting up dev, test, production indexes.
> >> 
> >> Any pointers, links to resources would be great. Thanks in advance
> >> 
> >> Lee C

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350

Re: Doubts regarding Multiple Keyword Search

2010-11-19 Thread Lance Norskog

The analysis stack for the text type includes the PorterStemmer. A
'stemmer' uses algorithms to trim similar words back to a base word.
In this case, it removes 'ing' from the end of 'testing'. This means
that 'test' and 'testing' are indexed exactly the same, and searching
for one will find the other.

On Thu, Nov 18, 2010 at 11:29 PM, Pawan Darira  wrote:
> Hi
>
> I am searching for keywords: ad testing (without quotes). I want result
> containing both words on the top. But it is giving me results containing
> words: ad test. Is it correct or any logic behind that i.e. will it consider
> the word "test" also ?
>
> Please help
>
> --
> Thanks,
> Pawan Darira
>

-- 
Lance Norskog
goks...@gmail.com

Re:how about another SolrIndexSearcher.numDocs method?

2010-11-19 Thread kafka0102

numDocs methods seem just for filterCache. So I just need use QueryResult 
search(QueryResult qr, QueryCommand cmd) by setting QueryCommand.len=0?I would 
try it.

At 2010-11-19 15:49:31，kafka0102  wrote:
In my app,I want to search numdocs for some queries. I see SolrIndexSearcher 
has two methods:
public int numDocs(Query a, DocSet b)
public int numDocs(Query a, Query b)

But these're not fit for me.For search's params,I get q and fq, and q' results 
are not in filterCache.But above methods are both using filtercache. So I think 
a method like:
public int numDocs(Query q, List fqs) (q not with filtercache,fqs with 
filtercache)
would be fine.
And now,I cannot extend SolrIndexSearcher because of SolrCore. What should I do 
to solve the problem?
thanks.




网易163/126邮箱百分百兼容iphone ipad邮件收发

39 matches

Mail list logo