Understanding RecoveryStrategy

2012-05-04 Thread Trym R. Møller

Hi

Using Solr trunk with the replica feature, I see the below exception 
repeatedly in the Solr log.
I have been looking into the code of RecoveryStrategy#commitOnLeader and 
read the code as follows:
1. sends a commit request (with COMMIT_END_POINT=true) to the Solr 
instance containing the leader of the slice
2. sends a commit request to the Solr instance containing the leader of 
the slice
The first results in a commit on the shards in the single leader Solr 
instance and the second results in a commit on the shards in the single 
leader Solr plus on all other Solrs having slices or replica belonging 
to the collection.


I would expect that the first request is the relevant (and enough to do 
a recovery of the specific replica).

Am I reading the second request wrong or is it a bug?

The code I'm referring to is
UpdateRequest ureq = new UpdateRequest();
ureq.setParams(new ModifiableSolrParams());
ureq.getParams().set(DistributedUpdateProcessor.COMMIT_END_POINT, 
true);

ureq.getParams().set(RecoveryStrategy.class.getName(), baseUrl);
1.ureq.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, 
true).process(

server);
2.server.commit();

Thanks in advance for any input.

Best regards Trym R. Møller

Apr 21, 2012 10:14:11 AM org.apache.solr.common.SolrException log
SEVERE: Error while trying to 
recover:org.apache.solr.client.solrj.SolrServerException: 
http://myIP:8983/solr/myShardId
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:493)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:103)
at 
org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:180)
at 
org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:156)
at 
org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:170)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:341)
at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)

Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:440)

... 8 more


Re: Invalid version expected 2, but 60 on CentOS

2012-05-04 Thread Ravi Solr
Thank you very much for responding Mr. Miller. There are 5 different
apps deployed on the same server as SOLR and all apps call SOLR as via
SOLRJ with localhost:8080/solr/sitecore as constructor url for
HttpSolrServer.out of all these 5 apps only one has this
issueif it is really the web server/container throwing 404s then
it should happen to other apps as well, as, they all call the same
core. This is what makes me believe its just not the web
server/container. Do I make sense ?

Thanks,

Ravi Kiran

On Fri, May 4, 2012 at 4:28 PM, Mark Miller  wrote:
>
> On May 4, 2012, at 4:09 PM, Ravi Solr wrote:
>
>> Thanking you in anticipation,
>
> Generally this happens because the webapp server is returning an html error 
> response of some kind. Often it's a 404.
>
> I think in trunk this might have been addressed - that is, it's easier to see 
> the true error. Not positive though.
>
> Some non success html response is likely coming back though.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: solr snapshots - old school and replication - new school ?

2012-05-04 Thread Lance Norskog
Yes. Replication is a lot easier to use and does a lot more.

On Thu, May 3, 2012 at 6:00 AM, geeky2  wrote:
> hello all,
>
> enviornment: centOS and solr 3.5
>
> i want to make sure i understand the difference between  snapshots and solr
> replication.
>
> snapshots are "old school" and have been deprecated with solr replication
> "new school".
>
> do i have this correct?
>
> btw: i have replication working (now), between my master and two slaves - i
> just want to make sure i am not missing a larger picture ;)
>
> i have been reading the Smiley Pugh book (pg 349) as well as material on the
> wiki at:
>
> http://wiki.apache.org/solr/SolrCollectionDistributionScripts
>
> http://wiki.apache.org/solr/SolrReplication
>
>
> thank you,
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-snapshots-old-school-and-replication-new-school-tp3959152.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goks...@gmail.com


Re: correct XPATH syntax

2012-05-04 Thread Lance Norskog
The XPath implementation in DIH is very minimal- it is tuned for
speed, not features. The XSL option lets you do everything you could
want, with a slower engine.

On Thu, May 3, 2012 at 7:30 AM, lboutros  wrote:
> ok, not that easy :)
>
> I did not test it myself but it seems that you could use an XSL
> preprocessing with the 'xsl' option in your XPathEntityProcessor :
>
> http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1
>
> You could transform the author part as you wish and then import the author
> field with your actual configuration.
>
> Ludovic.
>
> -
> Jouve
> France.
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959397.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goks...@gmail.com


Re: Phrase Slop probelm

2012-05-04 Thread Lance Norskog
Maybe it could throw an exception because the user is clearly trying
to do something impossible.

On Wed, May 2, 2012 at 3:19 PM, Jack Krupansky  wrote:
> You are missing the "pf", "pf2", and "pf3" request parameters, which says
> which fields to do phrase proximity boosting on.
>
> "pf" boosts using the whole query as a phrase, "pf2" boosts bigrams, and
> "pf3" boost trigrams.
>
> You can use any combination of them, but if you use none of them, "ps"
> appears to be ignored.
>
> Maybe it should default to doing some boost if none of the field lists is
> given, like boost using bigrams in the "qf" fields, but it doesn't.
>
> -- Jack Krupansky
>
> -Original Message- From: André Maldonado
> Sent: Wednesday, May 02, 2012 3:29 PM
> To: solr-user@lucene.apache.org
> Subject: Phrase Slop probelm
>
>
> Hi all.
>
> In my index I have a multivalued field that contains a lot of information,
> all text searches are based on it. So, When I Do:
>
> http://xxx.xx.xxx.xxx:/Index/select/?start=0&rows=12&q=term1+term2+term3&qf=textoboost&fq=field1%3aanother_term&defType=edismax&mm=100%25
>
> I got the same result as in:
>
> http://xxx.xx.xxx.xxx:/Index/select/?start=0&rows=12&q=term1+term2+term3
> *&ps=0*&qf=textoboost&fq=field1%3aanother_term&defType=edismax&mm=100%25
>
> And the same result in:
>
> http://xxx.xx.xxx.xxx:/Index/select/?start=0&rows=12&q=term1+term2+term3
> *&ps=10*
> &qf=textoboost&fq=field1%3aanother_term&defType=edismax&mm=100%25
>
> What I'm doing wrong?
>
> Thank's
>
> *
> --
> *
> *"E conhecereis a verdade, e a verdade vos libertará." (João 8:32)*
>
> *andre.maldonado*@gmail.com 
> (11) 9112-4227
>
> 
> 
> 
>  
> 
>  
> 
>  



-- 
Lance Norskog
goks...@gmail.com


Re: Searching by location – What do I send to Solr?

2012-05-04 Thread Lance Norskog
You could just download postalcodes every day. To be nice, you could
pull the HEAD of each file and check if it is new.

This is just a set of tables, which you denormalize and add to your
other fields.

There are other sources of polygonal shape data, but there is no
official Solr toolkit for querying inside the irregular polygon.

On Thu, May 3, 2012 at 6:19 PM, Erick Erickson  wrote:
> The fact that they're python and java is largely beside the point I think.
> Solr just sees a URL, the fact that your Python app gets in there
> first and "does stuff" with the query wouldn't affect Solr at all.
>
> Also, I tend to like keeping Solr fairly lean so any work I can offload to
> the application I usually do.
>
> YMMV
>
> Best
> Erick
>
> On Thu, May 3, 2012 at 6:43 PM, Spadez  wrote:
>> I discounted geonames to start with but it actually looks pretty good. I may
>> be stretching the limit of my question here, but say I did go with geonames,
>> if I go back to my model and add a bit:
>>
>> Search for "London"->Convert "London to Long/Lat"->Send Query to
>> Solr>->Return Query>
>>
>> Since my main website is coded in Python, but Solr works in Java, if I was
>> to create or use an existing script to allow me to convert "London" to
>> Long/Lat, would it make more sense for this operation to be done in Python
>> or Java?
>>
>> In Python it would integrate better with my website, but in Java it would
>> integrate better with Solr. Also would one language be more suitable or
>> faster for this kind of operation?
>>
>> Again, I might be pushing the boundaries of what I can ask on here, but if
>> anyone can chime in with their opinion I would really appreciate it.
>>
>> ~ James
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Searching-by-location-What-do-I-send-to-Solr-tp3959296p3960666.html
>> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goks...@gmail.com


Re: Solr Merge during off peak times

2012-05-04 Thread Lance Norskog
Optimize takes a 'maxSegments' option. This tells it to stop when
there are N segments instead of just one.

If you use a very high mergeFactor and then call optimize with a sane
number like 50, it only merges the little teeny segments.

On Thu, May 3, 2012 at 8:28 PM, Shawn Heisey  wrote:
> On 5/2/2012 5:54 AM, Prakashganesh, Prabhu wrote:
>>
>> We have a fairly large scale system - about 200 million docs and fairly
>> high indexing activity - about 300k docs per day with peak ingestion rates
>> of about 20 docs per sec. I want to work out what a good mergeFactor setting
>> would be by testing with different mergeFactor settings. I think the default
>> of 10 might be high, I want to try with 5 and compare. Unless I know when a
>> merge starts and finishes, it would be quite difficult to work out the
>> impact of changing mergeFactor. I want to be able to measure how long merges
>> take, run queries during the merge activity and see what the response times
>> are etc..
>
>
> With a lot of indexing activity, if you are attempting to avoid large
> merges, I would think you would want a higher mergeFactor, not a lower one,
> and do occasional optimizes during non-peak hours.  With a small
> mergeFactor, you will be merging a lot more often, and you are more likely
> to encounter merges of already-merged segments, which can be very slow.
>
> My index is nearing 70 million documents.  I've got seven shards - six large
> indexes with about 11.5 million docs each, and a small index that I try to
> keep below half a million documents.  The small index contains the newest
> documents, between 3.5 and 7 days worth.  With this setup and the way I
> manage it, large merges pretty much never happen.
>
> Once a minute, I do an update cycle.  This looks for and applies deletions,
> reinserts, and new document inserts.  New document inserts happen only on
> the small index, and there are usually a few dozen documents to insert on
> each update cycle.  Deletions and reinserts can happen on any of the seven
> shards, but there are not usually deletions and reinserts on every update
> cycle, and the number of reinserts is usually very very small.  Once an
> hour, I optimize the small index, which takes about 30 seconds.  Once a day,
> I optimize one of the large indexes during non-peak hours, so every large
> index gets optimized once every six days.  This takes about 15 minutes,
> during which deletes and reinserts are not applied, but new document inserts
> continue to happen.
>
> My mergeFactor is set to 35.  I wanted a large value here, and this
> particular number has a side effect -- uniformity in segment filenames on
> the disk during full rebuilds.  Lucene uses a base-36 segment numbering
> scheme.  I usually end up with less than 10 segments in the larger indexes,
> which means they don't do merges.  The small index does do merges, but I
> have never had a problem with those merges going slowly.
>
> Because I do occasionally optimize, I am fairly sure that even when I do
> have merges, they happen with 35 very small segment files, and leave the
> large initial segment alone.  I have not tested this theory, but it seems
> the most sensible way to do things, and I've found that Lucene/Solr usually
> does things in a sensible manner.  If I am wrong here (using 3.5 and its
> improved merging), I would appreciate knowing.
>
> Thanks,
> Shawn
>



-- 
Lance Norskog
goks...@gmail.com


Re: how to present html content in browse

2012-05-04 Thread Lance Norskog
You need positions and offsets to do highlighting. A CharFilter does
not preserve positions.

I think you have to analyze the raw HTML with a different Analyzer, as
well as the stripper. I think this is how it works: use a new Analyzer
stack that uses the StandardAnalyzer, and the lower case filter and
stemmer/synonym etc. Now, store the HTML field with that text type.
You then search on the stripped field, but highlight from the raw
field with 'hl.fl'.

Here's the cool part: you do not actually need to index the raw HTML,
only store it. If you do not index a field, the Highlighter analyzes
the HTML when it needs the positions and offsets.

On Fri, May 4, 2012 at 2:25 PM, okayndc  wrote:
> Okay, thanks for the info.
>
> On Fri, May 4, 2012 at 4:42 PM, Jack Krupansky wrote:
>
>> Evidently there was a problem with highlighting of HTML that is supposedly
>> fixed in Solr 3.6 and trunk:
>>
>> https://issues.apache.org/**jira/browse/SOLR-42
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: okayndc
>> Sent: Friday, May 04, 2012 4:35 PM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: how to present html content in browse
>>
>> Is it possible to return the HTML field highlighted?
>>
>> On Fri, May 4, 2012 at 1:27 PM, Jack Krupansky **
>> wrote:
>>
>>  1. The raw html field (call it, "text_html") would be a "string" type
>>> field that is "stored" but not "indexed". This is the field you direct DIH
>>> to output to. This is the field you would return in your search results
>>> with the HTML to be displayed.
>>>
>>> 2. The stripped field (call it, "text_stripped") would be a "text" type
>>> field (where "text" is a field type you add that uses the HTML strip char
>>> filter as shown below) that is not "stored" but is "indexed. Add a
>>> CopyField to your schema that copies from the raw html field to the
>>> stripped field (say, "text_html" to "text_stripped".)
>>>
>>> For reference on HTML strip (HTMLStripCharFilterFactory), see:
>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>>> 
>>> >
>>>
>>>
>>> Which has:
>>>
>>> 
>>>  
>>>  
>>>  >> mapping="mapping-**
>>> ISOLatin1Accent.txt"/>
>>>  
>>>  
>>>  
>>>  
>>>
>>>  
>>> 
>>>
>>> Although, you might want to call that field type "text_stripped" to avoid
>>> confusion with a simple text field
>>>
>>> You can add HTMLStripCharFilterFactory to some other field type that you
>>> might want to use, but this "charFilter" needs to be before the
>>> "tokenizer". The "text" field type above is just an example.
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: okayndc
>>> Sent: Friday, May 04, 2012 1:01 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: how to present html content in browse
>>>
>>>
>>> Hello,
>>>
>>> I'm having a hard time understanding this, and I had this same question.
>>>
>>> When using DIH should the HTML field be stored in the raw HTML string
>>> field
>>> or the stripped field?
>>> Also what source field(s) need to be copied and to what destination?
>>>
>>> Thanks
>>>
>>>
>>> On Thu, May 3, 2012 at 10:15 PM, Lance Norskog  wrote:
>>>
>>>  Make two fields, one with stores the stripped HTML and another that
>>>
 stores the parsed HTML. You can use  so that you do not
 have to submit the html page twice.

 You would mark the stripped field 'indexed=true stored=false' and the
 full text field the other way around. The full text field should be a
 String type.

 On Thu, May 3, 2012 at 1:04 PM, srini  wrote:
 > I am indexing records from database using DIH. The content of my record
 is in
 > html format. When I use browse
 > I would like to show the content in html format, not in text format. >
 Any
 > ideas?
 >
 > --
 > View this message in context:
 http://lucene.472066.n3.**nabb**le.com/how-to-present-**
 html-content-in-browse-tp3960327.html
 >

 > Sent from the Solr - User mailing list archive at Nabble.com.



 --
 Lance Norskog
 goks...@gmail.com



>>>
>>



-- 
Lance Norskog
goks...@gmail.com


RE: elevate vs. select numFound results

2012-05-04 Thread Noordeen, Roxy
I modified mysolrconfig.xml to:


dismax
explicit
true
0.01
content^2.0
15


1
*:*


elevator



Then added enableElevation=true parameter to my elevate url.
http://mydomain:8181/solr/elevate?q=dwayne+rock+johnson&wt=xml&sort=score+desc&fl=id,bundle_name&exclusive=true&debugQuery=on&enableElevation=true

This made my /elevate parsed query to match my /select query, and I got back 
same numFound.

My parsedquery: 

+((DisjunctionMaxQuery((content:dwayn)~0.01) 
DisjunctionMaxQuery((content:rock)~0.01) 
DisjunctionMaxQuery((content:johnson)~0.01))~1) 
DisjunctionMaxQuery((content:"dwayn rock johnson"~15^2.0)~0.01)



But it would be nice to make "exclusive=true" work, and get  empty result set 
back when there is no matching elevation query. 
Is there any solrconfig settings to do so?




-Original Message-
From: Noordeen, Roxy [mailto:roxy.noord...@wwecorp.com] 
Sent: Friday, May 04, 2012 8:11 PM
To: solr-user@lucene.apache.org
Subject: RE: elevate vs. select numFound results

My actual problem is  with elevate not working with "exclusive=true". I have a 
special pinned widget, that has to display only the nodes defined in my 
elevate.xml, kind of sponsored results.

If I define "game" in my elevte.xml, and send "exclusive=true" I get only the 
elevated entries.
http://:8181/solr/elevate?q=game&wt=xml&sort=score+desc&fl=id,bundle_name&exclusive=true

but when I pass a word not defined in my elevate.xml, and send 
"exclusive=true", I almost get same results like /select query.
http://:8181/solr/elevate?q=gamenotdefined&wt=xml&sort=score+desc&fl=id,bundle_name&exclusive=true

So I ended up in using both elevate and select, if numbers [numFound] MATCH in 
both the request, I assume the word does not exist in my elevate.xml, and I had 
to hide my pinned widget.
But in few cases, my /elevate and /select are not returning same numFound. 
There are some differences in the numbers. 

Is there a way to force "exclusive=true" just to look at elevate.xml entries, 
and ignore the result from default search?

 answer to your questions:

1. There is no exclude=true parameter set in my elevate.xml

2. There is no exlusive=true set in url

3. My elevate entry in solrconfig.xml


string
elevate.xml





explicit


elevator




4. I am not sure how to verify qf difference. I am using raw schema.xml and 
solrconfig.xml shipped with drupal solr module. I manage most of the solr 
configs via the drupal module, except at query time I query solr queries 
directly. 




-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Friday, May 04, 2012 5:44 PM
To: solr-user@lucene.apache.org
Subject: Re: elevate vs. select numFound results

Some ways that fewer docs might be returned by query elevation:

1. The "excude" option: exclude="true" in the xml file.
2. The "exclusive" request parameter: &exclusive=true in the URL. (Certainly 
not your case.)
3. The "exclusive" request parameter default set to "true" in "defaults" for 
the "/elevate" request handler in solrconfig.
4. Some other query-related parameters (e.g., "qf") are different between 
your "/select" and "/elevate" request handlers

Try adding &enableElevation=false to your URL for "/elevate", which should 
show you whether query elevation itself is affecting the number of docs, or 
if it must be some other parameters that are different between the two 
request handlers.

-- Jack Krupansky

-Original Message- 
From: roxy.noord...@wwecorp.com
Sent: Friday, May 04, 2012 3:21 PM
To: solr-user@lucene.apache.org
Subject: elevate vs. select numFound results

I need help understanding the difference in the numFound number in the 
result
when I execute two queries against my solr instance, one with the elevation
and one without. I have a simple elevate.xml file created and working and am
searching for terms that are not meant to be elevated.

Elevate query
example.com:8080/solr/elevate?q=dwayne+rock+johnson&wt=xml&sort=score+desc&rows=1
  for this the numFound is 125 in the result element of the XML

Select query
example.com:8080/solr/select?q=dwayne+rock+johnson&wt=xml&sort=score+desc&rows=1
  for this the numFound is 154 in the result element of the XML

For many (most all) of my queries the numFound results are the same (both
with elevated query strings and with strings not in elevate.xml), but this
one is very different.

Should they be the same? Any idea what could make them different?
Thank you.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/elevate-vs-select-numFound-results-tp3963200.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: SOLRJ: Is there a way to obtain a quick count of total results for a query

2012-05-04 Thread Li Li
don't score by relevance and score by document id may speed it up a little?
I haven't done any test of this. may be u can give it a try. because
scoring will consume
some cpu time. you just want to match and get total count

On Wed, May 2, 2012 at 11:58 PM, vybe3142  wrote:
> I can achieve this by building a query with start and rows = 0, and using
> .getResults().getNumFound().
>
> Are there any more efficient approaches to this?
>
> Thanks
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SOLRJ-Is-there-a-way-to-obtain-a-quick-count-of-total-results-for-a-query-tp3955322.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Minor type in example solrconfig: process of provided docuemnts

2012-05-04 Thread Jack Krupansky
I noticed this minor typo in the example solrconfig.xml for both 3.6 and trunk 
(as of 5/1):

An analysis handler that provides a breakdown of the analysis
process of provided docuemnts. This handler expects a (single)

“docuemnts” should be “documents”.

-- Jack Krupansky

Re: Single Index to Shards

2012-05-04 Thread Lance Norskog
If you are not using SolrCloud, splitting an index is simple:
1) copy the index
2) remove what you do not want via "delete-by-query"
3) Optimize!

#2 brings up a basic design question: you have to decide which
documents go to which shards. Mostly people use a value generated by a
hash on the actual id- this allows you to assign docs evenly.

http://wiki.apache.org/solr/UniqueKey

On Fri, May 4, 2012 at 4:28 PM, Young, Cody  wrote:
> You can also make a copy of your existing index, bring it up as a second 
> instance/core and then send delete queries to both indexes.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, May 04, 2012 8:37 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Single Index to Shards
>
> There's no way to split an _existing_ index into multiple shards, although 
> some of the work on SolrCloud is considering being able to do this. You have 
> a couple of choices here:
>
> 1> Just reindex everything from scratch into two shards
> 2> delete all the docs from your index that will go into shard 2 and
> 2> just
>     index the docs for shard 2 in your new shard
>
> But I want to be sure you're on the right track here. You only need to shard 
> if your index contains "too many" documents for your hardware to produce 
> decent query rates. If you are getting (and I'm picking this number out of 
> thin air) 50 QPS on your hardware (i.e. you're not stressing memory
> etc) and just want to get to 150 QPS, use replication rather than sharding.
>
> see: http://wiki.apache.org/solr/SolrReplication
>
> Best
> Erick
>
> On Fri, May 4, 2012 at 9:44 AM, michaelsever  wrote:
>> If I have a single Solr index running on a Core, can I split it or
>> migrate it into 2 shards?
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.ht
>> ml Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goks...@gmail.com


RE: elevate vs. select numFound results

2012-05-04 Thread Noordeen, Roxy
My actual problem is  with elevate not working with "exclusive=true". I have a 
special pinned widget, that has to display only the nodes defined in my 
elevate.xml, kind of sponsored results.

If I define "game" in my elevte.xml, and send "exclusive=true" I get only the 
elevated entries.
http://:8181/solr/elevate?q=game&wt=xml&sort=score+desc&fl=id,bundle_name&exclusive=true

but when I pass a word not defined in my elevate.xml, and send 
"exclusive=true", I almost get same results like /select query.
http://:8181/solr/elevate?q=gamenotdefined&wt=xml&sort=score+desc&fl=id,bundle_name&exclusive=true

So I ended up in using both elevate and select, if numbers [numFound] MATCH in 
both the request, I assume the word does not exist in my elevate.xml, and I had 
to hide my pinned widget.
But in few cases, my /elevate and /select are not returning same numFound. 
There are some differences in the numbers. 

Is there a way to force "exclusive=true" just to look at elevate.xml entries, 
and ignore the result from default search?

 answer to your questions:

1. There is no exclude=true parameter set in my elevate.xml

2. There is no exlusive=true set in url

3. My elevate entry in solrconfig.xml


string
elevate.xml





explicit


elevator




4. I am not sure how to verify qf difference. I am using raw schema.xml and 
solrconfig.xml shipped with drupal solr module. I manage most of the solr 
configs via the drupal module, except at query time I query solr queries 
directly. 




-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Friday, May 04, 2012 5:44 PM
To: solr-user@lucene.apache.org
Subject: Re: elevate vs. select numFound results

Some ways that fewer docs might be returned by query elevation:

1. The "excude" option: exclude="true" in the xml file.
2. The "exclusive" request parameter: &exclusive=true in the URL. (Certainly 
not your case.)
3. The "exclusive" request parameter default set to "true" in "defaults" for 
the "/elevate" request handler in solrconfig.
4. Some other query-related parameters (e.g., "qf") are different between 
your "/select" and "/elevate" request handlers

Try adding &enableElevation=false to your URL for "/elevate", which should 
show you whether query elevation itself is affecting the number of docs, or 
if it must be some other parameters that are different between the two 
request handlers.

-- Jack Krupansky

-Original Message- 
From: roxy.noord...@wwecorp.com
Sent: Friday, May 04, 2012 3:21 PM
To: solr-user@lucene.apache.org
Subject: elevate vs. select numFound results

I need help understanding the difference in the numFound number in the 
result
when I execute two queries against my solr instance, one with the elevation
and one without. I have a simple elevate.xml file created and working and am
searching for terms that are not meant to be elevated.

Elevate query
example.com:8080/solr/elevate?q=dwayne+rock+johnson&wt=xml&sort=score+desc&rows=1
  for this the numFound is 125 in the result element of the XML

Select query
example.com:8080/solr/select?q=dwayne+rock+johnson&wt=xml&sort=score+desc&rows=1
  for this the numFound is 154 in the result element of the XML

For many (most all) of my queries the numFound results are the same (both
with elevated query strings and with strings not in elevate.xml), but this
one is very different.

Should they be the same? Any idea what could make them different?
Thank you.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/elevate-vs-select-numFound-results-tp3963200.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Minor typo: None-hex character in unicode escape sequence

2012-05-04 Thread Jack Krupansky
I just happened to notice a typo when I mistyped a Unicode escape sequence in a 
query:

org.apache.lucene.queryparser.classic.ParseException: Cannot parse 
'sku:abc-0\ugabc0)': None-hex character in unicode escape sequence: g

“None-hex” should be “Non-hex”.

And “unicode” should be “Unicode”.

Same typo in both Solr 3.6 and trunk (as of 5/1).

My query:
http://localhost:8983/solr/select/?debugQuery=true&q=sku:abc-0\ugabc0) 

Dismax doesn’t get the error since apparently it doesn’t recognize Unicode 
escape sequences.

Edismax gets the same exact error as the Lucene query parser, which it uses.

Does this warrant a Jira?

-- Jack Krupansky

RE: Single Index to Shards

2012-05-04 Thread Young, Cody
You can also make a copy of your existing index, bring it up as a second 
instance/core and then send delete queries to both indexes.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, May 04, 2012 8:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Single Index to Shards

There's no way to split an _existing_ index into multiple shards, although some 
of the work on SolrCloud is considering being able to do this. You have a 
couple of choices here:

1> Just reindex everything from scratch into two shards
2> delete all the docs from your index that will go into shard 2 and 
2> just
 index the docs for shard 2 in your new shard

But I want to be sure you're on the right track here. You only need to shard if 
your index contains "too many" documents for your hardware to produce decent 
query rates. If you are getting (and I'm picking this number out of thin air) 
50 QPS on your hardware (i.e. you're not stressing memory
etc) and just want to get to 150 QPS, use replication rather than sharding.

see: http://wiki.apache.org/solr/SolrReplication

Best
Erick

On Fri, May 4, 2012 at 9:44 AM, michaelsever  wrote:
> If I have a single Solr index running on a Core, can I split it or 
> migrate it into 2 shards?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.ht
> ml Sent from the Solr - User mailing list archive at Nabble.com.


Re: elevate vs. select numFound results

2012-05-04 Thread Jack Krupansky

Some ways that fewer docs might be returned by query elevation:

1. The "excude" option: exclude="true" in the xml file.
2. The "exclusive" request parameter: &exclusive=true in the URL. (Certainly 
not your case.)
3. The "exclusive" request parameter default set to "true" in "defaults" for 
the "/elevate" request handler in solrconfig.
4. Some other query-related parameters (e.g., "qf") are different between 
your "/select" and "/elevate" request handlers


Try adding &enableElevation=false to your URL for "/elevate", which should 
show you whether query elevation itself is affecting the number of docs, or 
if it must be some other parameters that are different between the two 
request handlers.


-- Jack Krupansky

-Original Message- 
From: roxy.noord...@wwecorp.com

Sent: Friday, May 04, 2012 3:21 PM
To: solr-user@lucene.apache.org
Subject: elevate vs. select numFound results

I need help understanding the difference in the numFound number in the 
result

when I execute two queries against my solr instance, one with the elevation
and one without. I have a simple elevate.xml file created and working and am
searching for terms that are not meant to be elevated.

Elevate query
example.com:8080/solr/elevate?q=dwayne+rock+johnson&wt=xml&sort=score+desc&rows=1
 for this the numFound is 125 in the result element of the XML

Select query
example.com:8080/solr/select?q=dwayne+rock+johnson&wt=xml&sort=score+desc&rows=1
 for this the numFound is 154 in the result element of the XML

For many (most all) of my queries the numFound results are the same (both
with elevated query strings and with strings not in elevate.xml), but this
one is very different.

Should they be the same? Any idea what could make them different?
Thank you.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/elevate-vs-select-numFound-results-tp3963200.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: how to present html content in browse

2012-05-04 Thread okayndc
Okay, thanks for the info.

On Fri, May 4, 2012 at 4:42 PM, Jack Krupansky wrote:

> Evidently there was a problem with highlighting of HTML that is supposedly
> fixed in Solr 3.6 and trunk:
>
> https://issues.apache.org/**jira/browse/SOLR-42
>
>
> -- Jack Krupansky
>
> -Original Message- From: okayndc
> Sent: Friday, May 04, 2012 4:35 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: how to present html content in browse
>
> Is it possible to return the HTML field highlighted?
>
> On Fri, May 4, 2012 at 1:27 PM, Jack Krupansky **
> wrote:
>
>  1. The raw html field (call it, "text_html") would be a "string" type
>> field that is "stored" but not "indexed". This is the field you direct DIH
>> to output to. This is the field you would return in your search results
>> with the HTML to be displayed.
>>
>> 2. The stripped field (call it, "text_stripped") would be a "text" type
>> field (where "text" is a field type you add that uses the HTML strip char
>> filter as shown below) that is not "stored" but is "indexed. Add a
>> CopyField to your schema that copies from the raw html field to the
>> stripped field (say, "text_html" to "text_stripped".)
>>
>> For reference on HTML strip (HTMLStripCharFilterFactory), see:
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>> 
>> >
>>
>>
>> Which has:
>>
>> 
>>  
>>  
>>  > mapping="mapping-**
>> ISOLatin1Accent.txt"/>
>>  
>>  
>>  
>>  
>>
>>  
>> 
>>
>> Although, you might want to call that field type "text_stripped" to avoid
>> confusion with a simple text field
>>
>> You can add HTMLStripCharFilterFactory to some other field type that you
>> might want to use, but this "charFilter" needs to be before the
>> "tokenizer". The "text" field type above is just an example.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: okayndc
>> Sent: Friday, May 04, 2012 1:01 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: how to present html content in browse
>>
>>
>> Hello,
>>
>> I'm having a hard time understanding this, and I had this same question.
>>
>> When using DIH should the HTML field be stored in the raw HTML string
>> field
>> or the stripped field?
>> Also what source field(s) need to be copied and to what destination?
>>
>> Thanks
>>
>>
>> On Thu, May 3, 2012 at 10:15 PM, Lance Norskog  wrote:
>>
>>  Make two fields, one with stores the stripped HTML and another that
>>
>>> stores the parsed HTML. You can use  so that you do not
>>> have to submit the html page twice.
>>>
>>> You would mark the stripped field 'indexed=true stored=false' and the
>>> full text field the other way around. The full text field should be a
>>> String type.
>>>
>>> On Thu, May 3, 2012 at 1:04 PM, srini  wrote:
>>> > I am indexing records from database using DIH. The content of my record
>>> is in
>>> > html format. When I use browse
>>> > I would like to show the content in html format, not in text format. >
>>> Any
>>> > ideas?
>>> >
>>> > --
>>> > View this message in context:
>>> http://lucene.472066.n3.**nabb**le.com/how-to-present-**
>>> html-content-in-browse-tp3960327.html>> 472066.n3.nabble.com/how-to-**present-html-content-in-**
>>> browse-tp3960327.html
>>> >
>>>
>>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goks...@gmail.com
>>>
>>>
>>>
>>
>


Re: Facet and totaltermfreq

2012-05-04 Thread Jamie Johnson
it might be...can you provide an example of the request/response?

On Fri, May 4, 2012 at 3:31 PM, Dmitry Kan  wrote:
> I have tried (as a test) combining facets and term vectors (
> http://wiki.apache.org/solr/TermVectorComponent ) in one query and was able
> to get a list of facets and for each facet there was a term freq under
> termVectors section. Not sure, if that's what you are trying to achieve.
>
> -Dmitry
>
> On Fri, May 4, 2012 at 8:37 PM, Jamie Johnson  wrote:
>
>> Is it possible when faceting to return not only the strings but also
>> the total term frequency for those facets?  I am trying to avoid
>> building a customized faceting component and making multiple queries.
>> In our scenario we have multivalued fields which may have duplicates
>> and I would like to be able to get a count of how many documents that
>> term appears (currently what faceting does) but also how many times
>> that term appears in general.
>>
>
>
>
> --
> Regards,
>
> Dmitry Kan


Re: how to present html content in browse

2012-05-04 Thread Jack Krupansky
Evidently there was a problem with highlighting of HTML that is supposedly 
fixed in Solr 3.6 and trunk:


https://issues.apache.org/jira/browse/SOLR-42

-- Jack Krupansky

-Original Message- 
From: okayndc

Sent: Friday, May 04, 2012 4:35 PM
To: solr-user@lucene.apache.org
Subject: Re: how to present html content in browse

Is it possible to return the HTML field highlighted?

On Fri, May 4, 2012 at 1:27 PM, Jack Krupansky 
wrote:



1. The raw html field (call it, "text_html") would be a "string" type
field that is "stored" but not "indexed". This is the field you direct DIH
to output to. This is the field you would return in your search results
with the HTML to be displayed.

2. The stripped field (call it, "text_stripped") would be a "text" type
field (where "text" is a field type you add that uses the HTML strip char
filter as shown below) that is not "stored" but is "indexed. Add a
CopyField to your schema that copies from the raw html field to the
stripped field (say, "text_html" to "text_stripped".)

For reference on HTML strip (HTMLStripCharFilterFactory), see:
http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s

Which has:


 
  
  
  
  
  
  
 


Although, you might want to call that field type "text_stripped" to avoid
confusion with a simple text field

You can add HTMLStripCharFilterFactory to some other field type that you
might want to use, but this "charFilter" needs to be before the
"tokenizer". The "text" field type above is just an example.

-- Jack Krupansky

-Original Message- From: okayndc
Sent: Friday, May 04, 2012 1:01 PM
To: solr-user@lucene.apache.org
Subject: Re: how to present html content in browse


Hello,

I'm having a hard time understanding this, and I had this same question.

When using DIH should the HTML field be stored in the raw HTML string 
field

or the stripped field?
Also what source field(s) need to be copied and to what destination?

Thanks


On Thu, May 3, 2012 at 10:15 PM, Lance Norskog  wrote:

 Make two fields, one with stores the stripped HTML and another that

stores the parsed HTML. You can use  so that you do not
have to submit the html page twice.

You would mark the stripped field 'indexed=true stored=false' and the
full text field the other way around. The full text field should be a
String type.

On Thu, May 3, 2012 at 1:04 PM, srini  wrote:
> I am indexing records from database using DIH. The content of my record
is in
> html format. When I use browse
> I would like to show the content in html format, not in text format. 
> Any

> ideas?
>
> --
> View this message in context:
http://lucene.472066.n3.**nabble.com/how-to-present-**
html-content-in-browse-**tp3960327.html
> Sent from the Solr - User mailing list archive at Nabble.com.



--
Lance Norskog
goks...@gmail.com








Re: how to present html content in browse

2012-05-04 Thread okayndc
Is it possible to return the HTML field highlighted?

On Fri, May 4, 2012 at 1:27 PM, Jack Krupansky wrote:

> 1. The raw html field (call it, "text_html") would be a "string" type
> field that is "stored" but not "indexed". This is the field you direct DIH
> to output to. This is the field you would return in your search results
> with the HTML to be displayed.
>
> 2. The stripped field (call it, "text_stripped") would be a "text" type
> field (where "text" is a field type you add that uses the HTML strip char
> filter as shown below) that is not "stored" but is "indexed. Add a
> CopyField to your schema that copies from the raw html field to the
> stripped field (say, "text_html" to "text_stripped".)
>
> For reference on HTML strip (HTMLStripCharFilterFactory), see:
> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s
>
> Which has:
>
> 
>  
>   
>   
>   
>   
>   
>   
>  
> 
>
> Although, you might want to call that field type "text_stripped" to avoid
> confusion with a simple text field
>
> You can add HTMLStripCharFilterFactory to some other field type that you
> might want to use, but this "charFilter" needs to be before the
> "tokenizer". The "text" field type above is just an example.
>
> -- Jack Krupansky
>
> -Original Message- From: okayndc
> Sent: Friday, May 04, 2012 1:01 PM
> To: solr-user@lucene.apache.org
> Subject: Re: how to present html content in browse
>
>
> Hello,
>
> I'm having a hard time understanding this, and I had this same question.
>
> When using DIH should the HTML field be stored in the raw HTML string field
> or the stripped field?
> Also what source field(s) need to be copied and to what destination?
>
> Thanks
>
>
> On Thu, May 3, 2012 at 10:15 PM, Lance Norskog  wrote:
>
>  Make two fields, one with stores the stripped HTML and another that
>> stores the parsed HTML. You can use  so that you do not
>> have to submit the html page twice.
>>
>> You would mark the stripped field 'indexed=true stored=false' and the
>> full text field the other way around. The full text field should be a
>> String type.
>>
>> On Thu, May 3, 2012 at 1:04 PM, srini  wrote:
>> > I am indexing records from database using DIH. The content of my record
>> is in
>> > html format. When I use browse
>> > I would like to show the content in html format, not in text format. Any
>> > ideas?
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.**nabble.com/how-to-present-**
>> html-content-in-browse-**tp3960327.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>>
>


Re: Invalid version expected 2, but 60 on CentOS

2012-05-04 Thread Mark Miller

On May 4, 2012, at 4:09 PM, Ravi Solr wrote:

> Thanking you in anticipation,

Generally this happens because the webapp server is returning an html error 
response of some kind. Often it's a 404.

I think in trunk this might have been addressed - that is, it's easier to see 
the true error. Not positive though.

Some non success html response is likely coming back though.

- Mark Miller
lucidimagination.com













Invalid version expected 2, but 60 on CentOS

2012-05-04 Thread Ravi Solr
Hello,
We recently we migrated our production SOLR 3.6 servers OS
from Solaris to CentOS and from then on we started seeing "Invalid
version (expected 2, but 60)" errors on one of the query servers
(oddly one other query server seems fine). If we restart the
problematic server everything returns to normalcy, but the next day in
the morning again we get the same exception. I made sure that all the
client applications are using SOLR 3.6 version.

The Glassfish on which all the applications  and SOLR are deployed use
Java  1.6.0_29. The only difference I could see

1. The process indexing to the server having issues is using java1.6.0_31
2. The process indexing to the server that DOES NOT have issues is
using java1.6.0_29

Could the Java minor version being greater than the SOLR instance be
the cause of this issue  ?

Can anybody please help me debug this a bit more ? what else can I
look at to understand the underlying problem. The stack trace is given
below


[#|2012-05-04T09:58:43.985-0400|SEVERE|sun-appserver2.1.1|xxx...|_ThreadID=32;_ThreadName=httpSSLWorkerThread-9001-7;_RequestID=a19f92cc-2a8c-47e8-b159-a20330f14af5;
org.apache.solr.client.solrj.SolrServerException: Error executing query
   at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
   at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311)
   at 
xxx.xxx.xxx.FeedController.findLinksetNewsBySection(FeedController.java:743)
   at xxx.xxx.xxx.FeedController.findNewsBySection(FeedController.java:347)
   at sun.reflect.GeneratedMethodAccessor282.invoke(Unknown Source)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:175)
   at 
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421)
   at 
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:409)
   at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:774)
   at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719)
   at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:644)
   at 
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:549)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
   at 
org.apache.catalina.core.ApplicationFilterChain.servletService(ApplicationFilterChain.java:427)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:315)
   at 
org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218)
   at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
   at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
   at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94)
   at 
com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222)
   at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
   at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
   at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
   at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166)
   at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
   at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
   at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
   at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
   at 
org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:291)
   at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:670)
   at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.doProcess(DefaultProcessorTask.java:601)
   at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.process(DefaultProcessorTask.java:875)
   at 
com.sun.enterprise.web.connector.grizzly.DefaultReadTask.executeProcessorTask(DefaultReadTask.java:365)
   at 
com.sun.enterprise.web.connector.grizzly.DefaultReadTask

Re: SOLRJ: Is there a way to obtain a quick count of total results for a query

2012-05-04 Thread vybe3142
Fair enough, Thanks. Just wanted to confirm that there wasn't a better way of
accomplishing this.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLRJ-Is-there-a-way-to-obtain-a-quick-count-of-total-results-for-a-query-tp3955322p3963295.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet and totaltermfreq

2012-05-04 Thread Dmitry Kan
I have tried (as a test) combining facets and term vectors (
http://wiki.apache.org/solr/TermVectorComponent ) in one query and was able
to get a list of facets and for each facet there was a term freq under
termVectors section. Not sure, if that's what you are trying to achieve.

-Dmitry

On Fri, May 4, 2012 at 8:37 PM, Jamie Johnson  wrote:

> Is it possible when faceting to return not only the strings but also
> the total term frequency for those facets?  I am trying to avoid
> building a customized faceting component and making multiple queries.
> In our scenario we have multivalued fields which may have duplicates
> and I would like to be able to get a count of how many documents that
> term appears (currently what faceting does) but also how many times
> that term appears in general.
>



-- 
Regards,

Dmitry Kan


elevate vs. select numFound results

2012-05-04 Thread roxy.noord...@wwecorp.com
I need help understanding the difference in the numFound number in the result
when I execute two queries against my solr instance, one with the elevation
and one without. I have a simple elevate.xml file created and working and am
searching for terms that are not meant to be elevated.

Elevate query
example.com:8080/solr/elevate?q=dwayne+rock+johnson&wt=xml&sort=score+desc&rows=1
  for this the numFound is 125 in the result element of the XML

Select query
example.com:8080/solr/select?q=dwayne+rock+johnson&wt=xml&sort=score+desc&rows=1
  for this the numFound is 154 in the result element of the XML

For many (most all) of my queries the numFound results are the same (both
with elevated query strings and with strings not in elevate.xml), but this
one is very different.

Should they be the same? Any idea what could make them different?
Thank you.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/elevate-vs-select-numFound-results-tp3963200.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Template in a database field does not work. Please Help

2012-05-04 Thread RTI QA
Figured out.  I have to specify the column name incident_id in uppercase:



Looks like it is case sensitive for the transformer, even though to Oracle,
the column name is not case sensitive.

Thanks,
RTI QA


On Fri, May 4, 2012 at 1:44 PM, RTI QA  wrote:

> I specified template in a field
>
>  template="inc-${incident.incident_id}" />
>
> When doing full import, for each row retrieved from oracle, there is this
> output in the console:
>
> May 03, 2012 3:47:08 PM
> org.apache.solr.handler.dataimport.TemplateTransformer transformRow
>
> WARNING: Unable to resolve variable: incident.incident_id while parsing
> expression: inc-${incident.incident_id}
>
>
> Below is the data-config.xml file where the template is defined:
>
>
> 
>
>
>
>  url="jdbc:oracle:thin:@//dbtest:1521/ORCL" user="user" password="xxx"/>
>
>
>
>
>
> 
>
> 
>   transformer="TemplateTransformer"
>
>   query="select incident_id, ('inc-' || incident_id ) unique_id,
> long_desc from incident"
>
>   deltaQuery="select incident_id from incident where last_update
> > TO_DATE('${dataimporter.last_index_time}','-MM-DD HH24:MI:SS') "
>
>   >
>
>
>
> 
>
>  template="inc-${incident.incident_id}" />
>
> 
>
> 
>
> 
>
> 
>
> 
>
>
>
>
> Have tried to change the template to
>
>
> template="inc-${incident_id}"
>
>
> Still no luck, similar error.
>
>
> Don't know what the TemplateTransformer is looking for to match the
> variable.
>
>
> Thanks,
>
> RTI QA
>


Template in a database field does not work. Please Help

2012-05-04 Thread RTI QA
I specified template in a field



When doing full import, for each row retrieved from oracle, there is this
output in the console:

May 03, 2012 3:47:08 PM
org.apache.solr.handler.dataimport.TemplateTransformer transformRow

WARNING: Unable to resolve variable: incident.incident_id while parsing
expression: inc-${incident.incident_id}


Below is the data-config.xml file where the template is defined:



































Have tried to change the template to


template="inc-${incident_id}"


Still no luck, similar error.


Don't know what the TemplateTransformer is looking for to match the
variable.


Thanks,

RTI QA


Re: query keyword-tokenized fields with solrj

2012-05-04 Thread Jack Krupansky
You have an embedded space in your keyword value, which must be escaped, 
somehow. So, the actual query can be written as


article:"L. 111-5-2"

or

article:L.\ 111-5-2

The later is slightly prettier, I suppose.

I suppose you could use a wildcard:

article:L.*111-5-2
article:L.?111-5-2

If you want to make it uglier, that would be easy:

article:L.\u0020111-5-2

-- Jack Krupansky

-Original Message- 
From: G.Long

Sent: Friday, May 04, 2012 11:48 AM
To: solr-user@lucene.apache.org
Subject: query keyword-tokenized fields with solrj

Hi :)

In schema.xml I added a custom fieldType called keyword:







and a field called article :



Now I would like to query this field using solrj. I'm using the
following code:


SolrQuery query = new SolrQuery("article:L. 111-5-2");
QueryResponse rsp = server.query(query);
list = rsp.getResults();

Even though there is only one entry in my index with the value "L.
111-5-2" in the field "article" I get a lot of results because the
article value is not kept as a single token. I could change my string as
"article:\\"L. 111-5-2\\"" but I was wondering if there could be any
prettier way to do that (programmatically with the solrj api) ?

Gary 



Invalid version (expected 2, but 60) on CentOS in production please Help!!!

2012-05-04 Thread Ravi Solr
Hello,
 We Recently we migrated our SOLR 3.6 server OS from Solaris
to CentOS and from then on we started seeing "Invalid version
(expected 2, but 60)" errors on one of the query servers (oddly one
other query server seems fine). If we restart the server having issue
everything will be alright, but the next day in the morning again we
get the same exception. I made sure that all the client applications
are using SOLR 3.6 version.

The Glassfish on which all the applications  and SOLR are deployed use
Java  1.6.0_29. The only difference I could see

1. The process indexing to the server having issues is using java1.6.0_31
2. The process indexing to the server that DOES NOT have issues is
using java1.6.0_29

Could the Java minor version being greater than the SOLR instance be
the cause of this issue  ???

Can anybody please help me debug this a bit more ? what else can I
look at to understand the underlying problem. The stack trace is given
below


[#|2012-05-04T09:58:43.985-0400|SEVERE|sun-appserver2.1.1|xxx...|_ThreadID=32;_ThreadName=httpSSLWorkerThread-9001-7;_RequestID=a19f92cc-2a8c-47e8-b159-a20330f14af5;
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311)
at 
com.wpost.ipad.feeds.FeedController.findLinksetNewsBySection(FeedController.java:743)
at 
com.wpost.ipad.feeds.FeedController.findNewsBySection(FeedController.java:347)
at sun.reflect.GeneratedMethodAccessor282.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:175)
at 
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421)
at 
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:409)
at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:774)
at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719)
at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:644)
at 
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:549)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
at 
org.apache.catalina.core.ApplicationFilterChain.servletService(ApplicationFilterChain.java:427)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:315)
at 
org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218)
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94)
at 
com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222)
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
at 
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166)
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
at 
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
at 
org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:291)
at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:670)
at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.doProcess(DefaultProcessorTask.java:601)
at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.process(DefaultProcessorTask.java:875)
at 
com.sun.enterprise.web.connector.grizzly.DefaultReadTask.executeProcessorTask(DefaultReadTask.java:365)
at 
com.sun

Re: how to present html content in browse

2012-05-04 Thread Jack Krupansky
1. The raw html field (call it, "text_html") would be a "string" type field 
that is "stored" but not "indexed". This is the field you direct DIH to 
output to. This is the field you would return in your search results with 
the HTML to be displayed.


2. The stripped field (call it, "text_stripped") would be a "text" type 
field (where "text" is a field type you add that uses the HTML strip char 
filter as shown below) that is not "stored" but is "indexed. Add a CopyField 
to your schema that copies from the raw html field to the stripped field 
(say, "text_html" to "text_stripped".)


For reference on HTML strip (HTMLStripCharFilterFactory), see:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Which has:


 
   
   mapping="mapping-ISOLatin1Accent.txt"/>

   
   
   
   
 


Although, you might want to call that field type "text_stripped" to avoid 
confusion with a simple text field


You can add HTMLStripCharFilterFactory to some other field type that you 
might want to use, but this "charFilter" needs to be before the "tokenizer". 
The "text" field type above is just an example.


-- Jack Krupansky

-Original Message- 
From: okayndc

Sent: Friday, May 04, 2012 1:01 PM
To: solr-user@lucene.apache.org
Subject: Re: how to present html content in browse

Hello,

I'm having a hard time understanding this, and I had this same question.

When using DIH should the HTML field be stored in the raw HTML string field
or the stripped field?
Also what source field(s) need to be copied and to what destination?

Thanks


On Thu, May 3, 2012 at 10:15 PM, Lance Norskog  wrote:


Make two fields, one with stores the stripped HTML and another that
stores the parsed HTML. You can use  so that you do not
have to submit the html page twice.

You would mark the stripped field 'indexed=true stored=false' and the
full text field the other way around. The full text field should be a
String type.

On Thu, May 3, 2012 at 1:04 PM, srini  wrote:
> I am indexing records from database using DIH. The content of my record
is in
> html format. When I use browse
> I would like to show the content in html format, not in text format. Any
> ideas?
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/how-to-present-html-content-in-browse-tp3960327.html
> Sent from the Solr - User mailing list archive at Nabble.com.



--
Lance Norskog
goks...@gmail.com





Re: Faceting on a date field multiple times

2012-05-04 Thread SUJIT PAL
Hi Ian,

I believe you may be able to use a bunch of facet.query parameters, something 
like this:

facet.query=yourfield:[NOW-1DAY TO NOW]
facet.query=yourfield:[NOW-2DAY to NOW-1DAY]
...
and so on.

-sujit

On May 3, 2012, at 10:41 PM, Ian Holsman wrote:

> Hi.
> 
> I would like to be able to do a facet on a date field, but with different 
> ranges (in a single query).
> 
> for example. I would like to show
> 
> #documents by day for the last week - 
> #documents by week for the last couple of months
> #documents by year for the last several years.
> 
> is there a way to do this without hitting solr 3 times?
> 
> 
> thanks
> Ian



Re: >1MB file to Zookeeper

2012-05-04 Thread Yonik Seeley
On Fri, May 4, 2012 at 12:50 PM, Mark Miller  wrote:
>> And how should we detect if data is compressed when
>> reading from ZooKeeper?
>
> I was thinking we could somehow use file extensions?
>
> eg synonyms.txt.gzip - then you can use different compression algs depending 
> on the ext, etc.
>
> We would want to try and make it as transparent as possible though...

At first I thought about adding a marker to the beginning of a file, but
file extensions could work too, as long as the resource loader made it
transparent
(i.e. code would just need to ask for synonyms.txt, but the resource
loader would search
for synonyms.txt.gzip, etc, if the original name was not found)

Hmmm, but this breaks down for things like watches - I guess that's
where putting the encoding inside the file would be a better option.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: how to present html content in browse

2012-05-04 Thread okayndc
Hello,

I'm having a hard time understanding this, and I had this same question.

When using DIH should the HTML field be stored in the raw HTML string field
or the stripped field?
Also what source field(s) need to be copied and to what destination?

Thanks


On Thu, May 3, 2012 at 10:15 PM, Lance Norskog  wrote:

> Make two fields, one with stores the stripped HTML and another that
> stores the parsed HTML. You can use  so that you do not
> have to submit the html page twice.
>
> You would mark the stripped field 'indexed=true stored=false' and the
> full text field the other way around. The full text field should be a
> String type.
>
> On Thu, May 3, 2012 at 1:04 PM, srini  wrote:
> > I am indexing records from database using DIH. The content of my record
> is in
> > html format. When I use browse
> > I would like to show the content in html format, not in text format. Any
> > ideas?
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-present-html-content-in-browse-tp3960327.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: >1MB file to Zookeeper

2012-05-04 Thread Mark Miller

On May 3, 2012, at 8:30 AM, Markus Jelsma wrote:

> Hi.
> 
> Compression is a good suggestion. All large dictionaries are compressed well 
> below 1MB with GZIP. Where should this be implemented? SolrZkClient or 
> ZkController?

Hmm...I'm not sure - we want to be careful with this feature. Offhand, I'd 
guess if we can get it in SolrZkClient that is the right level.

The main issue is that we don't want to compress by default - we want to do it 
based on size or request - because it's much harder to inspect the data in zk 
if its compressed. We will probably want to add support to auto uncompress to 
the Admin Zk view UI.

> Which good compressor is already in Solr's lib?

I don't know that we have one yet - though the benchmark contrib uses a lib for 
compression (commons-compress from Apache).

> And what's the 
> difference between SolrZkClient setData and create?

setData sets data on an existing node - create creates a new node (with or 
without data).

> Should it autocompress 
> files larger than N bytes?

This seems like a reasonable approach to me...

> And how should we detect if data is compressed when 
> reading from ZooKeeper?

I was thinking we could somehow use file extensions?

eg synonyms.txt.gzip - then you can use different compression algs depending on 
the ext, etc.

We would want to try and make it as transparent as possible though...

> 
> On Thursday 03 May 2012 14:04:31 Mark Miller wrote:
>> On May 3, 2012, at 5:15 AM, Markus Jelsma wrote:
>>> Hi,
>>> 
>>> We've increased Zookeepers znode size limit to accomodate for some larger
>>> dictionaries and other files. It isn't the best idea to increase the
>>> maximum znode size. Any plans for splitting up larger files and storing
>>> them with multi? Does anyone have another suggestion?
>>> 
>>> Thanks,
>>> Markus
>> 
>> Patches welcome :) You can compress, you can break up the files, or you can
>> raise the limit - that's about the options I know of.
>> 
>> You might start by creating a JIRA issue.
>> 
>> - Mark Miller
>> lucidimagination.com
> 
> -- 
> Markus Jelsma - CTO - Openindex

- Mark Miller
lucidimagination.com













Re: Documents With large number of fields

2012-05-04 Thread Darren Govoni
I'm also interested in this. Same situation.

On Fri, 2012-05-04 at 10:27 -0400, Keswani, Nitin - BLS CTR wrote:
> Hi,
> 
> My data model consist of different types of data. Each data type has its own 
> characteristics
> 
> If I include the unique characteristics of each type of data, my single Solr 
> Document could end up containing 300-400 fields.
> 
> In order to drill down to this data set I would have to provide faceting on 
> most of these fields so that I can drilldown to very small set of
> Documents.
> 
> Here are some of the questions :
> 
> 1) What's the best approach when dealing with documents with large number of 
> fields .
> Should I keep a single document with large number of fields or split my
> document into a number of smaller  documents where each document would 
> consist of some fields
> 
> 2) From an operational point of view, what's the drawback of having a single 
> document with a very large number of fields.
> Can Solr support documents with large number of fields (say 300 to 400).
> 
> 
> Thanks.
> 
> Regards,
> 
> Nitin Keswani
> 




query keyword-tokenized fields with solrj

2012-05-04 Thread G.Long

Hi :)

In schema.xml I added a custom fieldType called keyword:







and a field called article :



Now I would like to query this field using solrj. I'm using the 
following code:



SolrQuery query = new SolrQuery("article:L. 111-5-2");
QueryResponse rsp = server.query(query);
list = rsp.getResults();

Even though there is only one entry in my index with the value "L. 
111-5-2" in the field "article" I get a lot of results because the 
article value is not kept as a single token. I could change my string as 
"article:\\"L. 111-5-2\\"" but I was wondering if there could be any 
prettier way to do that (programmatically with the solrj api) ?


Gary


Re: Single Index to Shards

2012-05-04 Thread Erick Erickson
There's no way to split an _existing_ index into multiple shards, although
some of the work on SolrCloud is considering being able to do this. You
have a couple of choices here:

1> Just reindex everything from scratch into two shards
2> delete all the docs from your index that will go into shard 2 and just
 index the docs for shard 2 in your new shard

But I want to be sure you're on the right track here. You only need to shard
if your index contains "too many" documents for your hardware to produce
decent query rates. If you are getting (and I'm picking this number out
of thin air) 50 QPS on your hardware (i.e. you're not stressing memory
etc) and just want to get to 150 QPS, use replication rather than sharding.

see: http://wiki.apache.org/solr/SolrReplication

Best
Erick

On Fri, May 4, 2012 at 9:44 AM, michaelsever  wrote:
> If I have a single Solr index running on a Core, can I split it or migrate it
> into 2 shards?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem with date searching.

2012-05-04 Thread Erick Erickson
Right, you need to do the explicit qualification of the date field.
dismax parsing is intended to work with text-type fields, not
numeric or date fields. If you attach &debugQuery=on, you'll
see that your "scanneddate" field is just dropped.

Furthermore, dismax was never intended to work with range
queries. Note this from the DisMaxQParserPlugin page:

" extremely simplified subset of the Lucene QueryParser syntax"

I'll expand on this a bit on the Wiki page.


Best
Erick

On Fri, May 4, 2012 at 6:45 AM, Dmitry Kan  wrote:
> unless, something else is wrong, my question would be, if you have the
> documents in solr stamped with these dates?
> also could try for a test specifying the field name directly:
>
> q=scanneddate:["2011-09-22T22:40:30Z" TO "2012-02-02T01:30:52Z"]
>
> also, in your first e-mail you said you have used
>
> [*"2012-02-02T01:30:52Z" TO "2012-02-02T01:30:52Z"*]
>
> with asterisks *, what scanneddate values did you then get?
>
> On Fri, May 4, 2012 at 1:37 PM, ayyappan  wrote:
>
>> thanks for quick response.
>>
>>  I tried your advice .  ["2011-09-22T22:40:30Z" TO "2012-02-02T01:30:52Z"]
>> like that even though i am not getting any result .
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761p3961833.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Regards,
>
> Dmitry Kan


Documents With large number of fields

2012-05-04 Thread Keswani, Nitin - BLS CTR
Hi,

My data model consist of different types of data. Each data type has its own 
characteristics

If I include the unique characteristics of each type of data, my single Solr 
Document could end up containing 300-400 fields.

In order to drill down to this data set I would have to provide faceting on 
most of these fields so that I can drilldown to very small set of
Documents.

Here are some of the questions :

1) What's the best approach when dealing with documents with large number of 
fields .
Should I keep a single document with large number of fields or split my
document into a number of smaller  documents where each document would 
consist of some fields

2) From an operational point of view, what's the drawback of having a single 
document with a very large number of fields.
Can Solr support documents with large number of fields (say 300 to 400).


Thanks.

Regards,

Nitin Keswani



RE: Single Index to Shards

2012-05-04 Thread Keswani, Nitin - BLS CTR
Yes you can split your index into multiple shards

More info on shards can be found here : 

http://lucidworks.lucidimagination.com/display/solr/Distributed+Search+with+Index+Sharding

Thanks.

Regards,

Nitin Keswani


-Original Message-
From: michaelsever [mailto:sever_mich...@bah.com] 
Sent: Friday, May 04, 2012 9:44 AM
To: solr-user@lucene.apache.org
Subject: Single Index to Shards

If I have a single Solr index running on a Core, can I split it or migrate it 
into 2 shards?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: search case: Elision and truncate in french

2012-05-04 Thread Jack Krupansky
Okay, the issue is that only *some* of the filters are "multi-term aware" 
and the elision filter is one that is NOT multi-term aware.


-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Friday, May 04, 2012 9:42 AM
To: solr-user@lucene.apache.org
Subject: Re: search case: Elision and truncate in french

Well, if it was "fixed", then it is now broken again - in the 3.6 release!
Here’s a snippet from debugQuery showing that the generated query has the
elision intact in the analyzed term:

text_fr:l'avion*
text_fr:l'avion*
+text_fr:l'avion*
+text_fr:l'avion*

And for the same term without wildcard:

text_fr:l'avion
text_fr:l'avion
+text_fr:avion
+text_fr:avion

-- Jack Krupansky

-Original Message- 
From: Erik Hatcher

Sent: Friday, May 04, 2012 9:06 AM
To: solr-user@lucene.apache.org
Subject: Re: search case: Elision and truncate in french

Jack - that was true, until Solr 3.6+:


So, Claire, it's possible with the latest Solr release, to do this using
bits and pieces of your existing analysis chain.

As Jack said, though, this is a manual chore in pre-Solr-3.6 releases.

Erik


On May 4, 2012, at 08:54 , Jack Krupansky wrote:

Unfortunately, use of a wildcard causes the normal token analysis 
processing to be completely bypassed, including the elision filter.  So, 
when using a wildcard you have to simulate in your head all of the 
analysis features, such as manually performing the elision.


-- Jack Krupansky

-Original Message- From: Claire Hernandez
Sent: Friday, May 04, 2012 5:08 AM
To: solr-user@lucene.apache.org
Cc: Jonathan Druart
Subject: search case: Elision and truncate in french

Hi all,

I have a little problem, I don't find an easy configuration solution but
maybe my google search is wrong :)

- ElisionFilterFactory is enabled for searching and indexing analyzer.
- Index contains: *l'aventure*
=> when I search *l'avent** solr finds nothing

I would have a solution which doesn't look sexy: having another index
with a patternreplacecharfilterfactory wich removes all "'" in strings.

Some tips would be usefull.

Thanks,
Claire; 




Single Index to Shards

2012-05-04 Thread michaelsever
If I have a single Solr index running on a Core, can I split it or migrate it
into 2 shards?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search case: Elision and truncate in french

2012-05-04 Thread Jack Krupansky
Well, if it was "fixed", then it is now broken again - in the 3.6 release! 
Here’s a snippet from debugQuery showing that the generated query has the 
elision intact in the analyzed term:


text_fr:l'avion*
text_fr:l'avion*
+text_fr:l'avion*
+text_fr:l'avion*

And for the same term without wildcard:

text_fr:l'avion
text_fr:l'avion
+text_fr:avion
+text_fr:avion

-- Jack Krupansky

-Original Message- 
From: Erik Hatcher

Sent: Friday, May 04, 2012 9:06 AM
To: solr-user@lucene.apache.org
Subject: Re: search case: Elision and truncate in french

Jack - that was true, until Solr 3.6+: 



So, Claire, it's possible with the latest Solr release, to do this using 
bits and pieces of your existing analysis chain.


As Jack said, though, this is a manual chore in pre-Solr-3.6 releases.

Erik


On May 4, 2012, at 08:54 , Jack Krupansky wrote:

Unfortunately, use of a wildcard causes the normal token analysis 
processing to be completely bypassed, including the elision filter.  So, 
when using a wildcard you have to simulate in your head all of the 
analysis features, such as manually performing the elision.


-- Jack Krupansky

-Original Message- From: Claire Hernandez
Sent: Friday, May 04, 2012 5:08 AM
To: solr-user@lucene.apache.org
Cc: Jonathan Druart
Subject: search case: Elision and truncate in french

Hi all,

I have a little problem, I don't find an easy configuration solution but
maybe my google search is wrong :)

- ElisionFilterFactory is enabled for searching and indexing analyzer.
- Index contains: *l'aventure*
=> when I search *l'avent** solr finds nothing

I would have a solution which doesn't look sexy: having another index
with a patternreplacecharfilterfactory wich removes all "'" in strings.

Some tips would be usefull.

Thanks,
Claire; 




Why would solr norms come up different from Lucene norms?

2012-05-04 Thread Benson Margulies
So, I've got some code that stores the same documents in a Lucene
3.5.0 index and a Solr 3.5.0 instance. It's only five documents.

For a particular field, the Solr norm is always 0.625, while the
Lucene norm is .5.

I've watched the code in NormsWriterPerField in both cases.

In Solr we've got .577, in naked Lucene it's .5.

I tried to check for boosts, and I don't see any non-1.0 document or
field boosts.

The Solr field is:




Re: search case: Elision and truncate in french

2012-05-04 Thread Erik Hatcher
Jack - that was true, until Solr 3.6+: 


So, Claire, it's possible with the latest Solr release, to do this using bits 
and pieces of your existing analysis chain.

As Jack said, though, this is a manual chore in pre-Solr-3.6 releases.

Erik


On May 4, 2012, at 08:54 , Jack Krupansky wrote:

> Unfortunately, use of a wildcard causes the normal token analysis processing 
> to be completely bypassed, including the elision filter.  So, when using a 
> wildcard you have to simulate in your head all of the analysis features, such 
> as manually performing the elision.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Claire Hernandez
> Sent: Friday, May 04, 2012 5:08 AM
> To: solr-user@lucene.apache.org
> Cc: Jonathan Druart
> Subject: search case: Elision and truncate in french
> 
> Hi all,
> 
> I have a little problem, I don't find an easy configuration solution but
> maybe my google search is wrong :)
> 
> - ElisionFilterFactory is enabled for searching and indexing analyzer.
> - Index contains: *l'aventure*
> => when I search *l'avent** solr finds nothing
> 
> I would have a solution which doesn't look sexy: having another index
> with a patternreplacecharfilterfactory wich removes all "'" in strings.
> 
> Some tips would be usefull.
> 
> Thanks,
> Claire; 



Re: search case: Elision and truncate in french

2012-05-04 Thread Jack Krupansky
Unfortunately, use of a wildcard causes the normal token analysis processing 
to be completely bypassed, including the elision filter.  So, when using a 
wildcard you have to simulate in your head all of the analysis features, 
such as manually performing the elision.


-- Jack Krupansky

-Original Message- 
From: Claire Hernandez

Sent: Friday, May 04, 2012 5:08 AM
To: solr-user@lucene.apache.org
Cc: Jonathan Druart
Subject: search case: Elision and truncate in french

Hi all,

I have a little problem, I don't find an easy configuration solution but
maybe my google search is wrong :)

- ElisionFilterFactory is enabled for searching and indexing analyzer.
- Index contains: *l'aventure*
=> when I search *l'avent** solr finds nothing

I would have a solution which doesn't look sexy: having another index
with a patternreplacecharfilterfactory wich removes all "'" in strings.

Some tips would be usefull.

Thanks,
Claire; 



Re: get latest 50 documents the fastest way

2012-05-04 Thread Nagendra Nagarajayya
You can do this with Solr 4.0 with RankingAlgorithm 1.4.2. Please pass 
the below parameters to your search:


&age=latest&docs=50

For eg:

http://localhost:8983/solr/select/?q=*:*&age=latest&docs=50

This would inspect the latest last 50 documents in real time and returns 
results accordingly. Using *:* will not affect the performance and you 
will not need any additional ranking or sort, etc.


Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org

On 5/1/2012 7:38 AM, Yuval Dotan wrote:

Hi Guys
We have a use case where we need to get the 50 *latest *documents that
match my query - without additional ranking,sorting,etc on the results.
My index contains 1,000,000,000 documents and i noticed that if the number
of found documents is very big (larger than 50% of the index size -
500,000,000 docs) than it takes more than 5 seconds to get the results even
with rows=50 parameter.
Is there a way to get the results faster?
Thanks
Yuval






Re: Word recognised in a search

2012-05-04 Thread Dmitry Kan
have you tried HighlightComponent? hl=true&hl.field=orig_text_field

- Dmitry

On Fri, May 4, 2012 at 1:52 PM, mattia.martine...@gmail.com <
mattia.martine...@gmail.com> wrote:

> Hi.
>
> I'm making some searches using Apache SOLR 1.4, but I will upgrade to 3.6.
>
> When SOLR uses stemming, it is very difficult to know what are the
> words that are really found (for example, if I search "ups" SOLR find
> "up" too).
> I need to know that because I need to highlight founded words in the
> text, and I need to extract some strings from the source using that
> words.
>
> I hope I managed in explain my problem well :-)
>
> Could you help me, please?
>
> Thank you very much!
> Bye.
>



-- 
Regards,

Dmitry Kan


Re: Parent-Child relationship

2012-05-04 Thread Erick Erickson
See: https://issues.apache.org/jira/browse/LUCENE-3759

No time-frame mentioned though.

Best
Erick

On Fri, May 4, 2012 at 4:20 AM, tamanjit.bin...@yahoo.co.in
 wrote:
> Hi,
> As per my understanding the join is confined to a single core only and it is
> not possible to have joins between docs of different cores. Am I correct
> here? If yes, is there a possibility of having joins across cores anytime
> soon?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Parent-Child-relationship-tp3958259p3961509.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting on a date field multiple times

2012-05-04 Thread Ian Holsman
Thanks Marc.
On May 4, 2012, at 8:52 PM, Marc Sturlese wrote:

> http://lucene.472066.n3.nabble.com/Multiple-Facet-Dates-td495480.html
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Faceting-on-a-date-field-multiple-times-tp3961282p3961865.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Word recognised in a search

2012-05-04 Thread mattia.martine...@gmail.com
Hi.

I'm making some searches using Apache SOLR 1.4, but I will upgrade to 3.6.

When SOLR uses stemming, it is very difficult to know what are the
words that are really found (for example, if I search "ups" SOLR find
"up" too).
I need to know that because I need to highlight founded words in the
text, and I need to extract some strings from the source using that
words.

I hope I managed in explain my problem well :-)

Could you help me, please?

Thank you very much!
Bye.


Re: Faceting on a date field multiple times

2012-05-04 Thread Marc Sturlese
http://lucene.472066.n3.nabble.com/Multiple-Facet-Dates-td495480.html

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceting-on-a-date-field-multiple-times-tp3961282p3961865.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem with date searching.

2012-05-04 Thread Dmitry Kan
unless, something else is wrong, my question would be, if you have the
documents in solr stamped with these dates?
also could try for a test specifying the field name directly:

q=scanneddate:["2011-09-22T22:40:30Z" TO "2012-02-02T01:30:52Z"]

also, in your first e-mail you said you have used

[*"2012-02-02T01:30:52Z" TO "2012-02-02T01:30:52Z"*]

with asterisks *, what scanneddate values did you then get?

On Fri, May 4, 2012 at 1:37 PM, ayyappan  wrote:

> thanks for quick response.
>
>  I tried your advice .  ["2011-09-22T22:40:30Z" TO "2012-02-02T01:30:52Z"]
> like that even though i am not getting any result .
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761p3961833.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,

Dmitry Kan


Re: problem with date searching.

2012-05-04 Thread ayyappan
thanks for quick response.

 I tried your advice .  ["2011-09-22T22:40:30Z" TO "2012-02-02T01:30:52Z"]
like that even though i am not getting any result .

--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761p3961833.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem with date searching.

2012-05-04 Thread Dmitry Kan
you have dates in the wrong order in the second query. Try instead:

["2011-09-22T22:40:30Z" TO "2012-02-02T01:30:52Z"]

in general:

[start_date TO end_date]

Dmitry

On Fri, May 4, 2012 at 1:10 PM, ayyappan  wrote:

> Hi
>
>  I'm having a slight problem with date searching... if i give same date
> range in search query it seems to be working fine when try to give the
> different date range and i am not getting result.
>
> Ex :
> select/?defType=dismax&q=[*"2012-02-02T01:30:52Z" TO
> "2012-02-02T01:30:52Z"*]&qf=scanneddate
>
> i am getting result 
>
> if try different date range .
>
> ["2012-02-02T01:30:52Z" TO "2011-09-22T22:40:30Z"]
>
> there is no record at all .please help me the same.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,

Dmitry Kan


problem with date searching.

2012-05-04 Thread ayyappan
Hi 

  I'm having a slight problem with date searching... if i give same date
range in search query it seems to be working fine when try to give the
different date range and i am not getting result.

Ex : 
select/?defType=dismax&q=[*"2012-02-02T01:30:52Z" TO
"2012-02-02T01:30:52Z"*]&qf=scanneddate

i am getting result 

if try different date range .

["2012-02-02T01:30:52Z" TO "2011-09-22T22:40:30Z"]

there is no record at all .please help me the same.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Advanced search with results matrix

2012-05-04 Thread Mikhail Khludnev
Hi,

have you considered to junk your subqueries into disjunction
(BooleanQuery.Occurs.SHOULD) and request
http://wiki.apache.org/solr/SimpleFacetParameters#facet.query_:_Arbitrary_Query_Faceting?

On Fri, May 4, 2012 at 1:32 PM, Gnanakumar  wrote:

> > 1. If I understand correctly you just need to perform one query. Like so
> > (translated to propper syntax of course):
> >("SQL Server" OR SQL) OR ("Visual Basic" OR VB.NET) OR (Java AND
> > JavaScript)
>
> No, it's not just one single query, rather, as I've mentioned before, it's
> combination of searches with result count for each combination.  Explained
> in detail below:
> 1) ("SQL Server" OR SQL)
> 2) ("Visual Basic" OR VB.NET)
> 3) (Java AND JavaScript)
> 4) ("SQL Server" OR SQL) AND ("Visual Basic" OR VB.NET)
> 5) ("Visual Basic" OR VB.NET) AND (Java AND JavaScript)
> 6) ("SQL Server" OR SQL) AND (Java AND JavaScript)
> 7) ("SQL Server" OR SQL) AND ("Visual Basic" OR VB.NET) AND (Java AND
> JavaScript)
>
> Hope I made it clear.
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


RE: Advanced search with results matrix

2012-05-04 Thread Gnanakumar
> 1. If I understand correctly you just need to perform one query. Like so 
> (translated to propper syntax of course):
>("SQL Server" OR SQL) OR ("Visual Basic" OR VB.NET) OR (Java AND 
> JavaScript)

No, it's not just one single query, rather, as I've mentioned before, it's
combination of searches with result count for each combination.  Explained
in detail below:
1) ("SQL Server" OR SQL)
2) ("Visual Basic" OR VB.NET)
3) (Java AND JavaScript)
4) ("SQL Server" OR SQL) AND ("Visual Basic" OR VB.NET)
5) ("Visual Basic" OR VB.NET) AND (Java AND JavaScript)
6) ("SQL Server" OR SQL) AND (Java AND JavaScript)
7) ("SQL Server" OR SQL) AND ("Visual Basic" OR VB.NET) AND (Java AND
JavaScript)

Hope I made it clear.




search case: Elision and truncate in french

2012-05-04 Thread Claire Hernandez

Hi all,

I have a little problem, I don't find an easy configuration solution but 
maybe my google search is wrong :)


- ElisionFilterFactory is enabled for searching and indexing analyzer.
- Index contains: *l'aventure*
=> when I search *l'avent** solr finds nothing

I would have a solution which doesn't look sexy: having another index 
with a patternreplacecharfilterfactory wich removes all "'" in strings.


Some tips would be usefull.

Thanks,
Claire;


Re: Parent-Child relationship

2012-05-04 Thread tamanjit.bin...@yahoo.co.in
Hi,
As per my understanding the join is confined to a single core only and it is
not possible to have joins between docs of different cores. Am I correct
here? If yes, is there a possibility of having joins across cores anytime
soon?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Parent-Child-relationship-tp3958259p3961509.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 3.5 Index Optimization not producing single .cfs file

2012-05-04 Thread pravesh
Thanx Mike,

>If you really must have a CFS (how come?) then you can call
>TieredMergePolicy.setNOCFSRatio(1.0) -- not sure how/where this is
>exposed in Solr though. 

BTW, would this impact the search performance? I mean i was just trying few
random keyword searches(without sort and filters) on both the system(1.4.1
vs 3.5) and found that 3.5 searches takes longer time than the 1.4.1(around
10-20% slower). Haven't done any load test till now

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-3-5-Index-Optimization-not-producing-single-cfs-file-tp3958619p3961441.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Advanced search with results matrix

2012-05-04 Thread David Radunz

Hey Gnanam,

1. If I understand correctly you just need to perform one query. Like so 
(translated to propper syntax of course):
  ("SQL Server" OR SQL) OR ("Visual Basic" OR VB.NET) OR (Java AND 
JavaScript)
2. Every query you perform with Solr returns the 'results' count, if you 
ONLY want the results count simply set rows to 0 (but im guessing you 
will want both the results and the count as to avoid 2 trips).
  - The 'results count' is here: start="0"/>  (being numFound)


David


On 4/05/2012 4:46 PM, Gnanakumar wrote:

Hi,

First off, we're a happy user of Apache Solr v3.1 Enterprise search server,
integrated and successfully running in our LIVE Production server.

Now, we're enhancing our existing search feature in our web application as
explained below, that truly helps application users in making informed
decision before getting their search results:

There will be 3 textboxes provided and users can enter keyword phrases with
OR, AND combination within each textbox as shown below, for example:
Textbox 1: "SQL Server" OR SQL
Textbox 2: "Visual Basic" OR VB.NET
Textbox 3: Java AND JavaScript

If User clicks "Search" button, we want to present an intermediate or
"results matrix" page that would generate all possible combinations for 3
textboxes with how many records found for each combination as given below
(between combination it is AND operation).  This, as I said before, truly
helps application users in making informed decision/choice before getting
their search results:
+-+-+---
-
Matches |   Textbox 1 |   Textbox 2 | Textbox 3
+-+-+---
-
   200  |"SQL Server" OR SQL  |   |
   300  | |"Visual Basic" OR VB.NET | 
   400  | | | Java AND
JavaScript
   250  |"SQL Server" OR SQL  |"Visual Basic" OR VB.NET |   
   350  | |"Visual Basic" OR VB.NET | Java AND
JavaScript
   300  |"SQL Server" OR SQL  |   | Java AND
JavaScript
   100  |"SQL Server" OR SQL  |"Visual Basic" OR VB.NET | Java AND
JavaScript
+-+-+---
-
Only on clicking one of this "Matches" count will display actual results of
that particular search.

My questions are,
1) Do I need to run search separately for each combination or is it
possible to combine and obtain "results matrix" page by making "only" one
single call to  Apache Solr?  Or are they any plug-ins available
that provides functionality close to my use case?
2) How do I instruct Solr to return only count (not result) for the
search performed?
3) Any ideas/suggestions/approaches/resources are really appreciated
and welcomed

Regards,
Gnanam