date:20140226

Re: SOLRJ and SOLR compatibility

2014-02-26 Thread Thomas Scheffler

Am 27.02.2014 08:04, schrieb Shawn Heisey:

On 2/26/2014 11:22 PM, Thomas Scheffler wrote:

I am one developer of a repository framework. We rely on the fact, that
"SolrJ generally maintains backwards compatibility, so you can use a
newer SolrJ with an older Solr, or an older SolrJ with a newer Solr." [1]

This statement is not even true for bugfix releases like 4.6.0 -> 4.6.1.
(SOLRJ 4.6.1, SOLR 4.6.0)

We use SolrInputDocument from SOLRJ to index our documents (javabin).
But as framework developer we are not in a role to force our users to
update their SOLR server such often. Instead with every new version we
want to update just the SOLRJ library we ship with to enable latest
features, if the user wishes.

When I send a query to a request handler I can attach a "version"
parameter to tell SOLR which version of the response format I expect.

Is there such a configuration when indexing SolrInputDocuments? I did
not find it so far.

What problems have you seen with mixing 4.6.0 and 4.6.1? It's possible
that I'm completely ignorant here, but I have not heard of any.

Actually bug reports arrive me that sound like

"Unknown type 19"

I am currently not able to reproduce it myself with server version
4.5.0, 4.5.1 and 4.6.0 when using solrj 4.6.1

It sounds to be the same issue like described here:

http://lucene.472066.n3.nabble.com/After-upgrading-indexer-to-SolrJ-4-6-1-o-a-solr-servlet-SolrDispatchFilter-Unknown-type-19-td4116152.html

The solution there was to upgrade the Server to version 4.6.1.

This helped here, too. Out there it is a very unpopular decision. Some
user have large SOLR installs and stick to a certain (4.x) version. They
want upgrades from us but upgrading company-wide SOLR installations is
out of their scope.

Is that a known SOLRJ issue that is fixed in version 4.7.0?

kind regards,

Thomas

Re: Column ambiguously defined error in SOLR delta import

2014-02-26 Thread Shawn Heisey

On 2/26/2014 11:42 PM, Chandan khatua wrote:
> I have the bellow query in data-config.xml, but it throws an error while
> running the delta query: "java.sql.SQLSyntaxErrorException: ORA-00918:
> column ambiguously defined".

These are the FIRST two hits that I got when I searched for your full
error string on Google:

http://ora-918.ora-code.com/
http://www.techonthenet.com/oracle/errors/ora00918.php

This error is coming from Oracle, not Solr.

You did not include your deltaImportQuery.  If you do not HAVE a
deltaImportQuery defined, Solr will try to guess what it should be doing
based on your main query and deltaQuery.  As it says in the following
wiki page, this is error-prone, and is likely to be the reason it's not
working.

http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config

It's always possible that the real problem here is a bug in the Oracle
JDBC driver.  Less likely is a bug in Oracle itself.

Thanks,
Shawn

Re: SOLRJ and SOLR compatibility

2014-02-26 Thread Shawn Heisey

On 2/26/2014 11:22 PM, Thomas Scheffler wrote:
> I am one developer of a repository framework. We rely on the fact, that
> "SolrJ generally maintains backwards compatibility, so you can use a
> newer SolrJ with an older Solr, or an older SolrJ with a newer Solr." [1]
> 
> This statement is not even true for bugfix releases like 4.6.0 -> 4.6.1.
> (SOLRJ 4.6.1, SOLR 4.6.0)
> 
> We use SolrInputDocument from SOLRJ to index our documents (javabin).
> But as framework developer we are not in a role to force our users to
> update their SOLR server such often. Instead with every new version we
> want to update just the SOLRJ library we ship with to enable latest
> features, if the user wishes.
> 
> When I send a query to a request handler I can attach a "version"
> parameter to tell SOLR which version of the response format I expect.
> 
> Is there such a configuration when indexing SolrInputDocuments? I did
> not find it so far.

What problems have you seen with mixing 4.6.0 and 4.6.1?  It's possible
that I'm completely ignorant here, but I have not heard of any.

A full discussion of this topic could fill a short novel.  This reply is
a little long, but hopefully digestible.  I am assuming that you have a
fair amount of familiarity with SolrJ here.  If there's something you
don't understand or seems wrong, we'll explore further.

The javabin format changed between 1.4.1 and the next version (3.1) in a
way that is incompatible in either direction, so mixing those versions
requires using XMLResponseWriter.  The javabin format has remained
unchanged since version 3.1.

Because Solr 1.x is very old and has the javabin incompatibility with
later releases, I will not be discussing it beyond what I wrote above.

You mentioned the version parameter.  SolrJ automatically handles this
in the requests it makes to Solr.  You don't need to worry about it.

One of the first things to say is that if you are using SolrCloud with
the CloudSolrServer object, the only way that you can have any assurance
of success with mixed versions is if your SolrJ version is newer than
your Solr version, and I would not be assured very far unless the minor
version is the same between the two.  SolrCloud is evolving at an
incredible pace.  As far as I know, *backwards* compatibility is pretty
good, but I would not be surprised to learn that there are some hiccups.
 I don't have a lot of experience with CloudSolrServer yet.

Cross-version compatibility with non-cloud setups is MUCH better.  A
non-cloud setup is assumed for the rest of this email.

I think it's important to mention that ConcurrentUpdateSolrServer and
its predecessor StreamingUpdateSolrServer are usually not a good choice,
unless you don't care about error handling.  These classes do NOT inform
the calling application of any error that occurs when sending updates to
Solr.  Rather than rely on one of these methods for making requests in
parallel, your application should be multithreaded and send parallel
requests itself with HttpSolrServer, which is completely threadsafe.

If you're mixing 3.x and 4.x versions, stick to the xml REQUEST writer.
 This is the default in all but the most recent versions of SolrJ, but
it's actually a good idea to explicitly set the writer object so you
won't be surprised by an upgrade.  You can use the binary RESPONSE
writer (which is the default in all versions) with no problem.

If both versions are 4.x, binary is fine for both the request writer and
the response writer, and for performance reasons, is the preferred
choice.  In non-cloud setups, there are very few problems to be found
with any combination of 4.x versions.

Thanks.
Shawn

Re: Know indexing time of a document

2014-02-26 Thread Gopal Patwa

you could just add a field with default value NOW in schema.xml, for example

  


On Wed, Feb 26, 2014 at 10:44 PM, pratpor  wrote:

> Is it possible to know the indexing time of a document in solr. Like there
> is
> a implicit field for "score" which automatically gets added to a document,
> is there a field that stores value of indexing time?
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Know-indexing-time-of-a-document-tp4120051.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Know indexing time of a document

2014-02-26 Thread Alexandre Rafalovitch

None that I know of, but you can easily have a date field with default
set to NOW. Or you can have an UpdateRequestProcessor that adds it in:
http://lucene.apache.org/solr/4_6_1/solr-core/org/apache/solr/update/processor/TimestampUpdateProcessorFactory.html

Regards,
   Alex
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Feb 27, 2014 at 5:44 PM, pratpor  wrote:
> Is it possible to know the indexing time of a document in solr. Like there is
> a implicit field for "score" which automatically gets added to a document,
> is there a field that stores value of indexing time?
>
> Thanks
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Know-indexing-time-of-a-document-tp4120051.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Know indexing time of a document

2014-02-26 Thread pratpor

Is it possible to know the indexing time of a document in solr. Like there is
a implicit field for "score" which automatically gets added to a document,
is there a field that stores value of indexing time?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Know-indexing-time-of-a-document-tp4120051.html
Sent from the Solr - User mailing list archive at Nabble.com.

Column ambiguously defined error in SOLR delta import

2014-02-26 Thread Chandan khatua

Hi 

 

 

I have the bellow query in data-config.xml, but it throws an error while
running the delta query: "java.sql.SQLSyntaxErrorException: ORA-00918:
column ambiguously defined".

Full data import is running fine.

 

Kindly suggest the changes required.

 

 



 

 

 

Thanking you,

 

-Chandan

Searching with special chars

2014-02-26 Thread deniz

Hello,

We are facing some kinda weird problem. So here is the scenario:

We have a frontend and a middle-ware which is dealing with user input search
queries before posting to Solr.

So when a user enters city:Frankenthal_(Pfalz) and then searches, there is
no result although there are fields on some documents matching
city:Frankenthal_(Pfalz). We are aware that we can escape those chars, but
the middleware which is accepting queries is running on a Glassfish server,
which is refusing URLs with backslashes in it, hence using backslashes is
not okay for posting the query.

To make everyone clear about the system it looks like:

(PHP) -> Encoded JSON -> (Glassfish App - Middleware) -> Javabin -> Solr
 
any other ideas who to deal with queries with special chars like this one? 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-with-special-chars-tp4120047.html
Sent from the Solr - User mailing list archive at Nabble.com.

SOLRJ and SOLR compatibility

2014-02-26 Thread Thomas Scheffler


Hi,

I am one developer of a repository framework. We rely on the fact, that 
"SolrJ generally maintains backwards compatibility, so you can use a 
newer SolrJ with an older Solr, or an older SolrJ with a newer Solr." [1]


This statement is not even true for bugfix releases like 4.6.0 -> 4.6.1. 
(SOLRJ 4.6.1, SOLR 4.6.0)


We use SolrInputDocument from SOLRJ to index our documents (javabin). 
But as framework developer we are not in a role to force our users to 
update their SOLR server such often. Instead with every new version we 
want to update just the SOLRJ library we ship with to enable latest 
features, if the user wishes.


When I send a query to a request handler I can attach a "version" 
parameter to tell SOLR which version of the response format I expect.


Is there such a configuration when indexing SolrInputDocuments? I did 
not find it so far.


Kind regards,

Thomas

[1] https://wiki.apache.org/solr/Solrj

Re: Solr cloud: Faceting issue on text field

2014-02-26 Thread David Miller

Hi Jack,

Ya, the requirement is like that. I also want to apply various filters on
the field like shingle, pattern replace etc. That is why I am using the
text field. (But for the above run these filters were not enabled)

The facet count is set as 10 and the unique terms can go into thousands.


Regards,




On Wed, Feb 26, 2014 at 6:33 PM, Jack Krupansky wrote:

> Are you sure you want to be faceting on a text field, as opposed to a
> string field? I mean, each term (word) from the text will be a separate
> facet value.
>
> How many facet values do you typically returning?
>
> How many unique terms occur in the facet field?
>
> -- Jack Krupansky
>
> -Original Message- From: David Miller
> Sent: Wednesday, February 26, 2014 2:06 PM
> To: solr-user@lucene.apache.org
> Subject: Solr cloud: Faceting issue on text field
>
>
> Hi,
>
> I am encountering an issue where Solr nodes goes down when trying to obtain
> facets on a text field. The cluster consists of a few servers and have
> around 200 million documents (small to medium). I am trying the faceting
> first time on this field and it gives a 502 Bad Gateway error along with
> some of the nodes going down and solr getting generally slow.
>
> The text field can have few words to a few thousand words. The Solr version
> we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the
> logs, Zookeeper was giving an EndOfStreamException
>
> Any hint on this will be helpful.
>
> Thanks & Regards,
>

Re: Tracing Solr Query Execution and Performance

2014-02-26 Thread KNitin

Thanks, Jack. I will file a jira then. What are the generic ways to
improve/tune a solr query if we know its expensive? Does the analysis page
help with this at all?


On Wed, Feb 26, 2014 at 3:39 PM, Jack Krupansky wrote:

> I don't recall seeing anything related to passing the debug/debugQuery
> parameters on for inter-node shard queries and then add that to the
> aggregated response (if debug/debugQuery was specified.) Sounds worth a
> Jira.
>
> -- Jack Krupansky
>
> -Original Message- From: KNitin
> Sent: Wednesday, February 26, 2014 5:25 PM
> To: solr-user@lucene.apache.org
> Subject: Tracing Solr Query Execution and Performance
>
>
> Hi there
>
>  I have a few very expensive queries (atleast thats what the QTime tells
> me) that is causing high CPU problems on a few nodes. Is there a way where
> I can "trace" or do an "explain" on the solr query to see where it spends
> more time? More like profiling on a per sub query basis?
>
> I have tried using debug=timing as a part of the query and it gives me
> stage level details (parsing, highlighting) but I need much more insights
> into where a query is spending time on
>
>
> Any help is much appreciated
>
> Thanks
> Nitin
>

Re: Fetching uniqueKey and other int quickly from documentCache?

2014-02-26 Thread Yonik Seeley

You could try forcing things to go through function queries (via pseudo-fields):

fl=field(id), field(myfield)

If you're not requesting any stored fields, that *might* currently
skip that step.

-Yonik
http://heliosearch.org - native off-heap filters and fieldcache for solr


On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan  wrote:
> We fetch a large number of documents -- 1000+ -- for each search. Each
> request fetches only the uniqueKey or the uniqueKey plus one secondary
> integer key. Despite this, we find that we spent a sizable amount of time
> in SolrIndexSearcher#doc(int docId, Set fields). Time is spent
> fetching the two stored fields, LZ4 decoding, etc.
>
> I would love to be able to tell Solr to always fetch these two fields from
> memory. We have them both in the fieldCache so we're already spending the
> RAM. I've seen this asked previously [1], so it seems like a fairly common
> need, especially for distributed search. Any ideas?
>
> A few possible ideas I had:
>
> --Check FieldCache.html#getCacheEntries() before going to stored fields.
> --Give the documentCache config a list of fields it should load from the
> fieldCache
>
>
> Having an in-memory mapping from docId->uniqueKey has come up for us
> before. We've used a custom SolrCache maintaining that mapping to quickly
> filter over personalized collections. Maybe the uniqueKey should be more
> optimized out of the box? Perhaps a custom "uniqueKey" codec that also
> maintained the docId->uniqueKey mapping in memory?
>
> --Gregg
>
> [1] http://search-lucene.com/m/oCUKJ1heHUU1

Re: Autocommit, opensearchers and ingestion

2014-02-26 Thread Mark Miller

On Feb 26, 2014, at 5:24 PM, Joel Cohen  wrote:

>  he's told me that he's doing commits in his SolrJ code
> every 1000 items (configurable). Does that override my Solr server settings?

Yes. Even if you have configured autocommit - explicit commits are explicit 
commits that happen on demand. Generally, clients should not send there own 
commits if you are using auto commit. If clients want to control this, it’s 
best to setup hard auto commit and have clients use commitWithin for soft 
commits.

It generally doesn’t make sense for a client to make explicit hard commits with 
SolrCloud.

- Mark

http://about.me/markrmiller

Re: Search score problem using bf edismax

2014-02-26 Thread Jack Krupansky


The bf parameter adds the value of a function query to the document store.

Your example did not include a bf parameter.

-- Jack Krupansky

-Original Message- 
From: Ing. Andrea Vettori

Sent: Wednesday, February 26, 2014 12:26 PM
To: solr-user@lucene.apache.org
Subject: Search score problem using bf edismax

Hi, I'm new to Solr and I'm trying to understand why I don't get what I want 
with the bf parameter.

The query debug information follows.
What I don't understand is why the result of the bf parameter is so low in 
score compared to matched fields.

Can anyone help ?
Thank you





 0
 19
 
   true
   true
   iphone cavo
   1393434305227
   xml
   3
 


 
   125520
   0125562
   Carica batterie da auto con cavo 
riavvolgibile

   CBR-AR I-Phone 1
   Cellular Line
   Cellulare - Cavo Accendisigari
   Cellular Line CBR-AR I-Phone 1
   true
   true
   IS107445|IP107261|ST300392|IG27586
   
   P98720
   1.0
   1
   22
   22
   0.22
   0.22
   
   15.9
   0
   2050-12-31T00:00:00Z
   15.9
   0.0
   0.0
   9.57
   0
   2020-12-31T00:00:00Z
   10.24
   9.95
   0.0
   2013-02-20T23:00:00Z
   
 ELDTEL003001001
   
   
 ELETEL03004
   
   
 9
 8
   
   9
   9
   9
   461945
   1.0
   1.0
   A1
   2011-06-05T22:00:00Z
   2011-06-06T00:00:00Z
   2013-10-21T00:00:00Z
   12
   9
   9
   930437
   1.0
   1.0
   A1
   2013-01-10T23:00:00Z
   2013-01-11T00:00:00Z
   2013-10-21T00:00:00Z
   18
   0
   
 C_0125562
 CAT_607
 M_186
 CM_607|186
 CAR_14956
   
   
 CAR_14956
 CAR_14952
   
   SIMILE
   1461102968118968320
   2014-02-26T12:06:14.32Z
 
   167951
   0167435
   Carica batteria da auto dedicato iPhone 5
   CBR-MFIPH5W-Phone 5
   Cellular Line
   Cellulare - Cavo Accendisigari
   Cellular Line CBR-MFIPH5W-Phone 5
   Cellulare - Cavo Accendisigari|Dedicato 
apple light

   true
   true
   IS174019|IP173834|ST135516|IG98795
   
   P135190
   1.0
   1
   22
   22
   0.22
   0.22
   
   18.02
   0
   2020-12-31T00:00:00Z
   19.27
   18.71
   0.0
   2013-02-20T23:00:00Z
   24.9
   0
   2050-12-31T00:00:00Z
   24.9
   0.0
   0.0
   
 ELDTEL003001001
   
   
 ELETEL03004
   
   
 9
 8
   
   9
   9
   9
   816069
   1.0
   1.0
   A1
   2012-12-05T23:00:00Z
   2012-12-06T00:00:00Z
   2013-10-21T00:00:00Z
   9
   9
   9
   941785
   1.0
   1.0
   A1
   2013-01-10T23:00:00Z
   2013-01-11T00:00:00Z
   2013-10-21T00:00:00Z
   65
   0
   
 C_0167435
 CAT_607
 M_186
 CM_607|186
 CAR_14957
   
   
 CAR_14957
 CAR_14952
   
   SIMILE
   1461103051247976448
   2014-02-26T12:07:33.597Z
 
   167185
   0166678
   Caricabatteria da auto per Apple IPHONE 5/IPAD 
MINI

   K39757EU
   Kensington
   Cellulare - Cavo Accendisigari
   Kensington K39757EU
   Cellulare - Cavo Accendisigari|Dedicato 
apple light

   true
   true
   IS171490|IP171305|ST133668|IG96264
   
   P134418
   1.0
   1
   22
   22
   0.22
   0.22
   
   19.9
   0
   2050-12-31T00:00:00Z
   19.9
   22.9
   0.0
   2014-02-06T23:00:00Z
   14.03
   0
   2020-12-31T00:00:00Z
   15.01
   16.16
   0.0
   2013-05-28T22:00:00Z
   
 ELDTEL003001001
   
   
 ELETEL03004
   
   
 9
 8
   
   9
   9
   9
   814053
   1.0
   1.0
   A1
   2012-11-22T23:00:00Z
   2012-11-23T00:00:00Z
   2014-02-13T00:00:00Z
   8
   9
   9
   941453
   1.0
   1.0
   A1
   2013-01-10T23:00:00Z
   2013-01-11T00:00:00Z
   2014-02-13T00:00:00Z
   76
   0
   
 C_0166678
 M_537
 CAT_607
 CM_607|537
 CAR_14957
   
   
 CAR_14957
 CAR_14952
   
   SIMILE
   1461103049458057216
   2014-02-26T12:07:31.891Z


 
 
 
 


 
   true
 


 iphone cavo
 iphone cavo
 (+((DisjunctionMaxQuery((categoria_s:iphone | 
titolo:iphone^2.0 | descrizione:iphon^0.5 | marchio_s:iphone | 
modello_s:iphone)) DisjunctionMaxQuery((categoria_s:cavo | titolo:cavo^2.0 | 
descrizione:cav^0.5 | marchio_s:cavo | modello_s:cavo)))~2) 
FunctionQuery(1.0/(1.0E-9*float(int(rank1_8))+1.0)))/no_coord
 +(((categoria_s:iphone | 
titolo:iphone^2.0 | descrizione:iphon^0.5 | marchio_s:iphone | 
modello_s:iphone) (categoria_s:cavo | titolo:cavo^2.0 | descrizione:cav^0.5 
| marchio_s:cavo | modello_s:cavo))~2) 
1.0/(1.0E-9*float(int(rank1_8))+1.0)

 
   
0.8545726 = (MATCH) sum of:
 0.82827055 = (MATCH) sum of:
   0.33939165 = (MATCH) max of:
 0.33939165 = (MATCH) weight(modello_s:iphone in 24160) 
[DefaultSimilarity], result of:

   0.33939165 = score(doc=24160,freq=1.0 = termFreq=1.0
), product of:
 0.21819489 = queryWeight, product of:
   8.295743 = idf(docFreq=170, maxDocs=252056)
   0.02630203 = queryNorm
 1.5554519 = fieldWeight in 24160, product of:
   1.0 = tf(freq=1.0), with freq of:
 1.0 = termFreq=1.0
   8.295743 = idf(docFreq=170, maxDocs=252056)
   0.1875 = fieldN

Re: Parallel queries to Solr

2014-02-26 Thread Jack Krupansky

Just send the queries to Solr in parallel using multiple threads in your 
application layer.


Solr can handle multiple, parallel queries as separate, parallel requests, 
but does not have a way to bundle multiple queries on a single request.


-- Jack Krupansky

-Original Message- 
From: solr2020

Sent: Wednesday, February 26, 2014 4:40 PM
To: solr-user@lucene.apache.org
Subject: Parallel queries to Solr

Hi,

We want to send parallel queries(2-3 queries) in the same request from
client to Solr. How to send the parallel queries from client side(using
Solrj).

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Parallel-queries-to-Solr-tp4119959.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How does Solr parse schema.xml?

2014-02-26 Thread Jack Krupansky

There is an existing Solr admin service to do that, which is what the Solr 
Admin UI uses to support that feature:


For example:

curl 
“http://localhost:8983/solr/analysis/field?analysis.fieldname=features&analysis.fieldvalue=Hello+World.&indent=true”


There are some examples in the next (unpublished) release of my book (that's 
one of them.)


That handler returns all token details, but if you wanted to roll your own, 
start there. The handler is:

org.apache.solr.handler.FieldAnalysisRequestHandler

-- Jack Krupansky

-Original Message- 
From: Software Dev

Sent: Wednesday, February 26, 2014 7:00 PM
To: solr-user@lucene.apache.org
Subject: How does Solr parse schema.xml?

Can anyone point me in the right direction. I'm trying to duplicate the
functionality of the analysis request handler so we can wrap a service
around it to return the terms given a string of text. We would like to read
the same schema.xml file to configure the analyzer,tokenizer, etc but I
can't seem to find the class that actually does the parsing of that file.

Thanks

Re: How does Solr parse schema.xml?

2014-02-26 Thread Steve Rowe

Check out org.apache.solr.schema.IndexSchema#readSchema(), which uses 
org.apache.solr.schema.FieldTypePluginLoader to parse analyzers.

On Feb 26, 2014, at 7:00 PM, Software Dev  wrote:

> Can anyone point me in the right direction. I'm trying to duplicate the
> functionality of the analysis request handler so we can wrap a service
> around it to return the terms given a string of text. We would like to read
> the same schema.xml file to configure the analyzer,tokenizer, etc but I
> can't seem to find the class that actually does the parsing of that file.
> 
> Thanks

How does Solr parse schema.xml?

2014-02-26 Thread Software Dev

Can anyone point me in the right direction. I'm trying to duplicate the
functionality of the analysis request handler so we can wrap a service
around it to return the terms given a string of text. We would like to read
the same schema.xml file to configure the analyzer,tokenizer, etc but I
can't seem to find the class that actually does the parsing of that file.

Thanks

Re: Tracing Solr Query Execution and Performance

2014-02-26 Thread Jack Krupansky

I don't recall seeing anything related to passing the debug/debugQuery 
parameters on for inter-node shard queries and then add that to the 
aggregated response (if debug/debugQuery was specified.) Sounds worth a 
Jira.


-- Jack Krupansky

-Original Message- 
From: KNitin

Sent: Wednesday, February 26, 2014 5:25 PM
To: solr-user@lucene.apache.org
Subject: Tracing Solr Query Execution and Performance

Hi there

 I have a few very expensive queries (atleast thats what the QTime tells
me) that is causing high CPU problems on a few nodes. Is there a way where
I can "trace" or do an "explain" on the solr query to see where it spends
more time? More like profiling on a per sub query basis?

I have tried using debug=timing as a part of the query and it gives me
stage level details (parsing, highlighting) but I need much more insights
into where a query is spending time on


Any help is much appreciated

Thanks
Nitin

Re: Solr cloud: Faceting issue on text field

2014-02-26 Thread Jack Krupansky

Are you sure you want to be faceting on a text field, as opposed to a string 
field? I mean, each term (word) from the text will be a separate facet 
value.


How many facet values do you typically returning?

How many unique terms occur in the facet field?

-- Jack Krupansky

-Original Message- 
From: David Miller

Sent: Wednesday, February 26, 2014 2:06 PM
To: solr-user@lucene.apache.org
Subject: Solr cloud: Faceting issue on text field

Hi,

I am encountering an issue where Solr nodes goes down when trying to obtain
facets on a text field. The cluster consists of a few servers and have
around 200 million documents (small to medium). I am trying the faceting
first time on this field and it gives a 502 Bad Gateway error along with
some of the nodes going down and solr getting generally slow.

The text field can have few words to a few thousand words. The Solr version
we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the
logs, Zookeeper was giving an EndOfStreamException

Any hint on this will be helpful.

Thanks & Regards,

Re: SolrCloud Startup

2014-02-26 Thread KNitin

Thanks, Shawn. I will try to upgrade solr soon

Reg firstSearcher: I think it does nothing now. I have configured to use
ExternalFileLoader but there the external file has no contents. Most of the
queries hitting the collection are expensive and tail queries. What will be
your recommendation to warm the first Searcher/new Searcher?

Thanks
Nitin


On Tue, Feb 25, 2014 at 4:12 PM, Shawn Heisey  wrote:

> On 2/25/2014 4:30 PM, KNitin wrote:
>
>> Jeff :  Thanks. I have tried reload before but it is not reliable (atleast
>> in 4.3.1). A few cores get initialized and few dont (show as just
>> recovering or down) and hence had to move away from it. Is it a known
>> issue
>> in 4.3.1?
>>
>
> With Solr 4.3.1, you are running into this bug with reloads under
> SolrCloud:
>
> https://issues.apache.org/jira/browse/SOLR-4805
>
> The only way to recover from this bug is to restart Solr.The bug is fixed
> in 4.4.0 and later.
>
>
>  Shawn,Otis,Erick
>>
>>   Yes I have reviewed the page before and have given 1/4 of my mem to JVM
>> and the rest to RAM/Os Cache. (15 Gb heap and 45 G to rest. Totally 60G
>> machine). I have also reviewed the tlog file and they are in the order of
>> KB (4-10 or 30). I have SSD and the reads are hardly noticable (in the
>> order of 100Kb during that time frame). I have also disabled swap on all
>> machines
>>
>> Regarding firstSearcher, It is currently set to externalFileLoader. What
>> is
>> the use of first searcher? I havent played around with it
>>
>
> I don't think it's a good idea to have extensive warming queries.  I do
> exactly one query in firstSearcher and newSearcher: a query for all
> documents with zero rows, sorted on our most common sort field.  This is
> designed purely to preload the sort data into the FieldCache.
>
> Thanks,
> Shawn
>
>

Filter query exclusion with SolrJ

2014-02-26 Thread idioma

I use a SolrJ-based client to query Solr and I have been trying to construct
HTTP requests where facet name/value pairs are excluded. The web interface I
am working with has a refine further functionality, which allows excluding
one or more facet values. I have 3 facet fields: domain, content type and
author and I would like to be able to handle faceting by exclusion on each
of them. For example, q = Dickens AND fq=-author:Dickens, Janet will
construct the following HTTP request:

/solr/solrbase/select?q=Dickens&fq=-author:Dickens%2c+Janet&wt=json&indent=true

Whereas the XML dump will look like:

 
  
Dickens, Charles
Dickens, Sarah
  


So far, the Java implementation I am working with does not seems to handle
filter query exclusion:

private HttpSolrServer solrServer;
solrServer = new HttpSolrServer("http://localhost:8983/solr/";);

private static final String CONFIG_SOLR_FACET_FIELD = "facet_field";
private String[] _facetFields = new String[] {"author"};

private static final String CONFIG_SOLR_FACETS = "facets"
 Element el = myParams.getChild(CONFIG_SOLR_FACETS);

_facetUse = el.getAttributeValue("useFacets", "true");
_facetMinCount = el.getAttributeValue("minCount",
String.valueOf(1));
_facetLimit = el.getAttributeValue("limit", String.valueOf(20));


List vals = el.getChildren(CONFIG_SOLR_FACET_FIELD);
if (vals.size() > 0) {
_facetFields = new String[vals.size()];
for (int i=0; i < vals.size(); i++) {
_facetFields[i] = ((Element)vals.get(i)).getTextTrim();
}   
}

SolrQuery query = new SolrQuery();
query.setQuery(qs);


List facetList = doc.getRootElement().getChildren("facet");
Iterator it = facetList.iterator();
while (it.hasNext()) {
Element el = (Element)it.next(); //
String name = el.getAttributeValue("name"); 
String value = el.getTextTrim();
if (name != null && value != null) {
facets.add(name+":"+value);
}

}


query.setQuery(qs).
   setFacet(Boolean.parseBoolean(_facetUse)).
   setFacetMinCount(Integer.parseInt(_facetMinCount)).
   setFacetLimit(Integer.parseInt(_facetLimit)).

for (int i=0; i<_facetFields.length; i++) {
query.addFacetField(_facetFields[i]);   
};

for (int i=0; ihttp://lucene.472066.n3.nabble.com/Filter-query-exclusion-with-SolrJ-tp4119974.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cloud: Faceting issue on text field

2014-02-26 Thread David Miller

Hi Greg,

Thanks for the info. But the scenario in link is little bit different from
my requirement.

Regards,



On Wed, Feb 26, 2014 at 4:46 PM, Greg Walters wrote:

> I don't have much experience with faceting and its best practices though
> I'm sure someone else on here can pipe up to address your questions there.
> In the mean time have you read
> http://sbdevel.wordpress.com/2013/04/16/you-are-faceting-itwrong/?
>
>
> On Feb 26, 2014, at 3:26 PM, David Miller  wrote:
>
> > Hi Greg,
> >
> > Yes, the memory and cpu spiked for that machine. Another issue I found in
> > the log was "SolrException: Too many values for UnInvertedField faceting
> on
> > field".
> > I was using the fc method. Will changing the method/params help?
> >
> > One thing I don't understand is that, the query was returning only a
> single
> > document, but the facet still seems to be having the issue.
> >
> > So, it should be technically possible to get facets on text field over
> > 200-300 million docs at a decent speed, right?
> >
> >
> > Regards,
> >
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Feb 26, 2014 at 2:13 PM, Greg Walters  >wrote:
> >
> >> IIRC faceting uses copious amounts of memory; have you checked for GC
> >> activity while the query is running?
> >>
> >> Thanks,
> >> Greg
> >>
> >> On Feb 26, 2014, at 1:06 PM, David Miller 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> I am encountering an issue where Solr nodes goes down when trying to
> >> obtain
> >>> facets on a text field. The cluster consists of a few servers and have
> >>> around 200 million documents (small to medium). I am trying the
> faceting
> >>> first time on this field and it gives a 502 Bad Gateway error along
> with
> >>> some of the nodes going down and solr getting generally slow.
> >>>
> >>> The text field can have few words to a few thousand words. The Solr
> >> version
> >>> we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking
> the
> >>> logs, Zookeeper was giving an EndOfStreamException
> >>>
> >>> Any hint on this will be helpful.
> >>>
> >>> Thanks & Regards,
> >>
> >>
>
>

Tracing Solr Query Execution and Performance

2014-02-26 Thread KNitin

Hi there

  I have a few very expensive queries (atleast thats what the QTime tells
me) that is causing high CPU problems on a few nodes. Is there a way where
I can "trace" or do an "explain" on the solr query to see where it spends
more time? More like profiling on a per sub query basis?

I have tried using debug=timing as a part of the query and it gives me
stage level details (parsing, highlighting) but I need much more insights
into where a query is spending time on


Any help is much appreciated

Thanks
Nitin

Re: Autocommit, opensearchers and ingestion

2014-02-26 Thread Joel Cohen

I read that blog too! Great info. I've bumped up the commit times and
turned the ingestion up a bit as well. I've upped hard commit to 5 minutes
and the soft commit to 60 seconds.

 
   ${solr.autoCommit.maxTime:30}
   false
 

 
   ${solr.autoSoftCommit.maxTime:6}
 

I'm still getting the same issue. After speaking to the engineer working on
the ingestion code, he's told me that he's doing commits in his SolrJ code
every 1000 items (configurable). Does that override my Solr server settings?


On Tue, Feb 25, 2014 at 3:27 PM, Erick Erickson wrote:

> Gopal: I'm glad somebody noticed that blog!
>
> Joel:
> For bulk loads it's a Good Thing to lengthen out
> your soft autocommit interval. A lot. Every second
> poor Solr is trying to open up a new searcher while
> you're throwing lots of documents at it. That's what's
> generating the "too many searchers" problem I'd
> guess. Soft commits are less expensive than hard
> commits with openSearcher=true (you're not doing this,
> and you shouldn't be). But soft commits aren't free.
> All the top-level caches are thrown away and autowarming
> is performed.
>
> Also, I'd probably consider just leaving off the bit about
> maxDocs in your hard commit, I find it rarely does all
> that much good. After all, even if you have to replay the
> transaction log, you're only talking 15 seconds here.
>
> Best,
> Erick
>
>
> On Tue, Feb 25, 2014 at 12:08 PM, Gopal Patwa 
> wrote:
>
> > This blog by Eric will help you to understand different commit option and
> > transaction logs and it does provide some recommendation for ingestion
> > process.
> >
> >
> >
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >
> >
> > On Tue, Feb 25, 2014 at 11:40 AM, Furkan KAMACI  > >wrote:
> >
> > > Hi;
> > >
> > > You should read here:
> > >
> > >
> >
> http://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F
> > >
> > > On the other hand do you have 4 Zookeeper instances as a quorum?
> > >
> > > Thanks;
> > > Furkan KAMACI
> > >
> > >
> > > 2014-02-25 20:31 GMT+02:00 Joel Cohen :
> > >
> > > > Hi all,
> > > >
> > > > I'm working with Solr 4.6.1 and I'm trying to tune my ingestion
> > process.
> > > > The ingestion runs a big DB query and then does some ETL on it and
> > > inserts
> > > > via SolrJ.
> > > >
> > > > I have a 4 node cluster with 1 shard per node running in Tomcat with
> > > > -Xmx=4096M. Each node has a separate instance of Zookeeper on it,
> plus
> > > the
> > > > ingestion server has one as well. The Solr servers have 8 cores and
> 64
> > Gb
> > > > of total RAM. The ingestion server is a VM with 8 Gb and 2 cores.
> > > >
> > > > My ingestion code uses a few settings to control concurrency and
> batch
> > > > size.
> > > >
> > > > solr.update.batchSize=500
> > > > solr.threadCount=4
> > > >
> > > > With this setup, I'm getting a lot of errors and the ingestion is
> > taking
> > > > much longer than it should.
> > > >
> > > > Every so often during the ingestion I get these errors on the Solr
> > > servers:
> > > >
> > > > WARN  shard1 - 2014-02-25 11:18:34.341;
> > > > org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay
> > > >
> > > >
> > >
> >
> tlog{file=/usr/local/solr_shard1/productCatalog/data/tlog/tlog.0014074
> > > > refcount=2} active=true starting pos=776774
> > > > WARN  shard1 - 2014-02-25 11:18:37.275;
> > > > org.apache.solr.update.UpdateLog$LogReplayer; Log replay finished.
> > > > recoveryInfo=RecoveryInfo{adds=4065 deletes=0 deleteByQuery=0
> errors=0
> > > > positionOfStart=776774}
> > > > WARN  shard1 - 2014-02-25 11:18:37.960;
> org.apache.solr.core.SolrCore;
> > > > [productCatalog] PERFORMANCE WARNING: Overlapping onDeckSearchers=2
> > > > WARN  shard1 - 2014-02-25 11:18:37.961;
> org.apache.solr.core.SolrCore;
> > > > [productCatalog] Error opening new searcher. exceeded limit of
> > > > maxWarmingSearchers=2, try again later.
> > > > WARN  shard1 - 2014-02-25 11:18:37.961;
> org.apache.solr.core.SolrCore;
> > > > [productCatalog] Error opening new searcher. exceeded limit of
> > > > maxWarmingSearchers=2, try again later.
> > > > ERROR shard1 - 2014-02-25 11:18:37.961;
> > > > org.apache.solr.common.SolrException;
> > > org.apache.solr.common.SolrException:
> > > > Error opening new searcher. exceeded limit of maxWarmingSearchers=2,
> > try
> > > > again later.
> > > > at
> > org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1575)
> > > > at
> > org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1346)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:592)
> > > >
> > > > I cut threads down to 1 and batchSize down to 100 and the errors go
> > away,
> > > > but the upload time jumps up by a factor of 25.
> > > >
> > > > My solrconfig.xml has:
> > > >
> > > >  
> > > >${solr.autoCommit.maxDo

Re: concurrentlinkedhashmap 1.2 vs 1.4

2014-02-26 Thread Guido Medina

Done, created under SolrCloud component, couldn't find a more 
appropriate like Server - Java or something, hope it has all the info 
needed, I could contribute to it sometime next week, waiting for new PC 
parts from Amazon to have a proper after work dev environment.


Regards,

Guido.

On 26/02/14 20:58, Mark Miller wrote:

Thanks Guido - any chance you could file a JIRA issue for this?

- Mark

http://about.me/markrmiller

On Feb 26, 2014, at 6:28 AM, Guido Medina  wrote:


I think it would need Guava v16.0.1 to benefit from the ported code.

Guido.

On 26/02/14 11:20, Guido Medina wrote:

As notes also stated at concurrentlinkedhashmap v1.4, the performance changes 
were ported to Guava (don't know to what version to be honest), so, wouldn't be 
better to use MapMaker builder?

Regards,

Guido.

On 26/02/14 11:15, Guido Medina wrote:

Hi,

I noticed Solr is using concurrentlinkedhashmap v1.2 which is for Java 5, 
according to notes at https://code.google.com/p/concurrentlinkedhashmap/ 
version 1.4 has performance improvements compared to v1.2, isn't Solr 4.x 
designed against Java 6+? If so, wouldn't it benefit from v1.4?

Regards,

Guido.

Re: Solr cloud: Faceting issue on text field

2014-02-26 Thread Greg Walters

I don't have much experience with faceting and its best practices though I'm 
sure someone else on here can pipe up to address your questions there. In the 
mean time have you read 
http://sbdevel.wordpress.com/2013/04/16/you-are-faceting-itwrong/? 


On Feb 26, 2014, at 3:26 PM, David Miller  wrote:

> Hi Greg,
> 
> Yes, the memory and cpu spiked for that machine. Another issue I found in
> the log was "SolrException: Too many values for UnInvertedField faceting on
> field".
> I was using the fc method. Will changing the method/params help?
> 
> One thing I don't understand is that, the query was returning only a single
> document, but the facet still seems to be having the issue.
> 
> So, it should be technically possible to get facets on text field over
> 200-300 million docs at a decent speed, right?
> 
> 
> Regards,
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Feb 26, 2014 at 2:13 PM, Greg Walters wrote:
> 
>> IIRC faceting uses copious amounts of memory; have you checked for GC
>> activity while the query is running?
>> 
>> Thanks,
>> Greg
>> 
>> On Feb 26, 2014, at 1:06 PM, David Miller  wrote:
>> 
>>> Hi,
>>> 
>>> I am encountering an issue where Solr nodes goes down when trying to
>> obtain
>>> facets on a text field. The cluster consists of a few servers and have
>>> around 200 million documents (small to medium). I am trying the faceting
>>> first time on this field and it gives a 502 Bad Gateway error along with
>>> some of the nodes going down and solr getting generally slow.
>>> 
>>> The text field can have few words to a few thousand words. The Solr
>> version
>>> we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the
>>> logs, Zookeeper was giving an EndOfStreamException
>>> 
>>> Any hint on this will be helpful.
>>> 
>>> Thanks & Regards,
>> 
>>

Parallel queries to Solr

2014-02-26 Thread solr2020

Hi,

We want to send parallel queries(2-3 queries) in the same request from
client to Solr. How to send the parallel queries from client side(using
Solrj).

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Parallel-queries-to-Solr-tp4119959.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cloud: Faceting issue on text field

2014-02-26 Thread David Miller

Hi Greg,

Yes, the memory and cpu spiked for that machine. Another issue I found in
the log was "SolrException: Too many values for UnInvertedField faceting on
field".
I was using the fc method. Will changing the method/params help?

One thing I don't understand is that, the query was returning only a single
document, but the facet still seems to be having the issue.

So, it should be technically possible to get facets on text field over
200-300 million docs at a decent speed, right?

Regards,

On Wed, Feb 26, 2014 at 2:13 PM, Greg Walters wrote:

> IIRC faceting uses copious amounts of memory; have you checked for GC
> activity while the query is running?
>
> Thanks,
> Greg
>
> On Feb 26, 2014, at 1:06 PM, David Miller  wrote:
>
> > Hi,
> >
> > I am encountering an issue where Solr nodes goes down when trying to
> obtain
> > facets on a text field. The cluster consists of a few servers and have
> > around 200 million documents (small to medium). I am trying the faceting
> > first time on this field and it gives a 502 Bad Gateway error along with
> > some of the nodes going down and solr getting generally slow.
> >
> > The text field can have few words to a few thousand words. The Solr
> version
> > we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the
> > logs, Zookeeper was giving an EndOfStreamException
> >
> > Any hint on this will be helpful.
> >
> > Thanks & Regards,
>
>

Re: concurrentlinkedhashmap 1.2 vs 1.4

2014-02-26 Thread Mark Miller

Thanks Guido - any chance you could file a JIRA issue for this?

- Mark

http://about.me/markrmiller

On Feb 26, 2014, at 6:28 AM, Guido Medina  wrote:

> I think it would need Guava v16.0.1 to benefit from the ported code.
> 
> Guido.
> 
> On 26/02/14 11:20, Guido Medina wrote:
>> As notes also stated at concurrentlinkedhashmap v1.4, the performance 
>> changes were ported to Guava (don't know to what version to be honest), so, 
>> wouldn't be better to use MapMaker builder?
>> 
>> Regards,
>> 
>> Guido.
>> 
>> On 26/02/14 11:15, Guido Medina wrote:
>>> Hi,
>>> 
>>> I noticed Solr is using concurrentlinkedhashmap v1.2 which is for Java 5, 
>>> according to notes at https://code.google.com/p/concurrentlinkedhashmap/ 
>>> version 1.4 has performance improvements compared to v1.2, isn't Solr 4.x 
>>> designed against Java 6+? If so, wouldn't it benefit from v1.4?
>>> 
>>> Regards,
>>> 
>>> Guido.
>> 
>

Re: Cluster state ranges are all null after reboot

2014-02-26 Thread Greg Pendlebury

Thanks Shalin, that code might be helpful... do you know if there is a
reliable way to line up the ranges with the shard numbers? When the problem
occurred we had 80 million documents already in the index, and could not
issue even a basic 'deleteById' call. I'm tempted to assume they are just
assigned linearly since our Test and Prod clusters both look to work that
way now, but I can't be sure whether that is by design or just happenstance
of boot order.

And no, unfortunately we have not been able to reproduce this issue
consistently despite trying a number of different things such as graceless
stop/start and screwing with the underlying WAR file (which is what we
thought puppet might be doing). The problem has occurred twice since, but
always in our Test environment. The fact that Test has only a single
replica per shard is the most likely culprit for me, but as mentioned, even
gracelessly killing the last replica in the cluster seems to leave the
range set correctly in clusterstate when we test it in isolation.

In production (45 JVMs, 15 shards with 3 replicas each) we've never seen
the problem, despite a similar number of rollouts for version changes etc.

Ta,
Greg

On 26 February 2014 23:46, Shalin Shekhar Mangar wrote:

> If you have 15 shards and assuming that you've never used shard
> splitting, you can calculate the shard ranges by using new
> CompositeIdRouter().partitionRange(15, new
> CompositeIdRouter().fullRange())
>
> This gives me:
> [8000-9110, 9111-a221, a222-b332,
> b333-c443, c444-d554, d555-e665,
> e666-f776, f777-887, 888-1998,
> 1999-2aa9, 2aaa-3bba, 3bbb-4ccb,
> 4ccc-5ddc, 5ddd-6eed, 6eee-7fff]
>
> Have you done any more investigation into why this happened? Anything
> strange in the logs? Are you able to reproduce this in a test
> environment?
>
> On Wed, Feb 19, 2014 at 5:16 AM, Greg Pendlebury
>  wrote:
> > We've got a 15 shard cluster spread across 3 hosts. This morning our
> puppet
> > software rebooted them all and afterwards the 'range' for each shard has
> > become null in zookeeper. Is there any way to restore this value short of
> > rebuilding a fresh index?
> >
> > I've read various questions from people with a similar problem, although
> in
> > those cases it is usually a single shard that has become null allowing
> them
> > to infer what the value should be and manually fix it in ZK. In this
> case I
> > have no idea what the ranges should be. This is our test cluster, and
> > checking production I can see that the ranges don't appear to be
> > predictable based on the shard number.
> >
> > I'm also not certain why it even occurred. Our test cluster only has a
> > single replica per shard, so when a JVM is rebooted the cluster is
> > unavailable... would that cause this? Production has 3 replicas so we can
> > do rolling reboots.
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Solr Permgen Exceptions when creating/removing cores

2014-02-26 Thread Josh

Thanks Timothy,

I gave these a try and -XX:+CMSPermGenSweepingEnabled seemed to cause the
error to happen more quickly. With this option on it didn't seemed to do
any intermittent garbage collecting that delayed the issue in with it off.
I was already using a max of 512MB, and I can reproduce it with it set this
high or even higher. Right now because of how we have this implemented just
increasing it to something high just delays the problem :/

Anything else you could suggest I would really appreciate.


On Wed, Feb 26, 2014 at 3:19 PM, Tim Potter wrote:

> Hi Josh,
>
> Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM
> versions, permgen collection was disabled by default.
>
> Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64M may
> be too small.
>
>
> Timothy Potter
> Sr. Software Engineer, LucidWorks
> www.lucidworks.com
>
> 
> From: Josh 
> Sent: Wednesday, February 26, 2014 12:27 PM
> To: solr-user@lucene.apache.org
> Subject: Solr Permgen Exceptions when creating/removing cores
>
> We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows
> installation with 64bit Java 1.7U51 and we are seeing consistent issues
> with PermGen exceptions. We have the permgen configured to be 512MB.
> Bitnami ships with a 32bit version of Java for windows and we are replacing
> it with a 64bit version.
>
> Passed in Java Options:
>
> -XX:MaxPermSize=64M
> -Xms3072M
> -Xmx6144M
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+CMSClassUnloadingEnabled
> -XX:NewRatio=3
>
> -XX:MaxTenuringThreshold=8
>
> This is our use case:
>
> We have what we call a database core which remains fairly static and
> contains the imported contents of a table from SQL server. We then have
> user cores which contain the record ids of results from a text search
> outside of Solr. We then query for the data we want from the database core
> and limit the results to the content of the user core. This allows us to
> combine facet data from Solr with the search results from another engine.
> We are creating the user cores on demand and removing them when the user
> logs out.
>
> Our issue is the constant creation and removal of user cores combined with
> the constant importing seems to push us over our PermGen limit. The user
> cores are removed at the end of every session and as a test I made an
> application that would loop creating the user core, import a set of data to
> it, query the database core using it as a limiter and then remove the user
> core. My expectation was in this scenario that all the permgen associated
> with that user cores would be freed upon it's unload and allow permgen to
> reclaim that memory during a garbage collection. This was not the case, it
> would constantly go up until the application would exhaust the memory.
>
> I also investigated whether the there was a connection between the two
> cores left behind because I was joining them together in a query but even
> unloading the database core after unloading all the user cores won't
> prevent the limit from being hit or any memory to be garbage collected from
> Solr.
>
> Is this a known issue with creating and unloading a large number of cores?
> Could it be configuration based for the core? Is there something other than
> unloading that needs to happen to free the references?
>
> Thanks
>
> Notes: I've tried using tools to determine if it's a leak within Solr such
> as Plumbr and my activities turned up nothing.
>

RE: Solr Permgen Exceptions when creating/removing cores

2014-02-26 Thread Tim Potter

Hi Josh,

Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM versions, 
permgen collection was disabled by default.

Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64M may be 
too small.


Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Josh 
Sent: Wednesday, February 26, 2014 12:27 PM
To: solr-user@lucene.apache.org
Subject: Solr Permgen Exceptions when creating/removing cores

We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows
installation with 64bit Java 1.7U51 and we are seeing consistent issues
with PermGen exceptions. We have the permgen configured to be 512MB.
Bitnami ships with a 32bit version of Java for windows and we are replacing
it with a 64bit version.

Passed in Java Options:

-XX:MaxPermSize=64M
-Xms3072M
-Xmx6144M
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+CMSClassUnloadingEnabled
-XX:NewRatio=3

-XX:MaxTenuringThreshold=8

This is our use case:

We have what we call a database core which remains fairly static and
contains the imported contents of a table from SQL server. We then have
user cores which contain the record ids of results from a text search
outside of Solr. We then query for the data we want from the database core
and limit the results to the content of the user core. This allows us to
combine facet data from Solr with the search results from another engine.
We are creating the user cores on demand and removing them when the user
logs out.

Our issue is the constant creation and removal of user cores combined with
the constant importing seems to push us over our PermGen limit. The user
cores are removed at the end of every session and as a test I made an
application that would loop creating the user core, import a set of data to
it, query the database core using it as a limiter and then remove the user
core. My expectation was in this scenario that all the permgen associated
with that user cores would be freed upon it's unload and allow permgen to
reclaim that memory during a garbage collection. This was not the case, it
would constantly go up until the application would exhaust the memory.

I also investigated whether the there was a connection between the two
cores left behind because I was joining them together in a query but even
unloading the database core after unloading all the user cores won't
prevent the limit from being hit or any memory to be garbage collected from
Solr.

Is this a known issue with creating and unloading a large number of cores?
Could it be configuration based for the core? Is there something other than
unloading that needs to happen to free the references?

Thanks

Notes: I've tried using tools to determine if it's a leak within Solr such
as Plumbr and my activities turned up nothing.

CollapsingQParserPlugin is slower than standard Solr field grouping in Solr 4.6.1

2014-02-26 Thread Joe Ho

I notice that in Solr 4.6.1 CollapsingQParserPlugin is slower than standard
Solr field grouping. I have a Solr index of 1 docs, with a signature
field which is a Solr dedup field of the doc content. Majority of the
signatures are unique.



With standard Solr field grouping,
http://localhost:4462/solr/collection1/select?q=*:*&group.ngroups=true&group=true&group.field=signature&group.main=true&rows=1&fl=id

I get average QTime 78 after Solr warmed up.

Using CollapsingQParserPlugin,
http://localhost:4462/solr/collection1/select?q=*:*&fq={!collapse%20field=signature}&rows=1&fl=id

I get average QTime 89.2

In fact CollapsingQParserPlugin QTime is always slower than the standard
Solr field grouping.

How can I get CollapsingQParserPlugin run faster?

Joe

Solr Permgen Exceptions when creating/removing cores

2014-02-26 Thread Josh

We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows
installation with 64bit Java 1.7U51 and we are seeing consistent issues
with PermGen exceptions. We have the permgen configured to be 512MB.
Bitnami ships with a 32bit version of Java for windows and we are replacing
it with a 64bit version.

Passed in Java Options:

-XX:MaxPermSize=64M
-Xms3072M
-Xmx6144M
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+CMSClassUnloadingEnabled
-XX:NewRatio=3

-XX:MaxTenuringThreshold=8

This is our use case:

We have what we call a database core which remains fairly static and
contains the imported contents of a table from SQL server. We then have
user cores which contain the record ids of results from a text search
outside of Solr. We then query for the data we want from the database core
and limit the results to the content of the user core. This allows us to
combine facet data from Solr with the search results from another engine.
We are creating the user cores on demand and removing them when the user
logs out.

Our issue is the constant creation and removal of user cores combined with
the constant importing seems to push us over our PermGen limit. The user
cores are removed at the end of every session and as a test I made an
application that would loop creating the user core, import a set of data to
it, query the database core using it as a limiter and then remove the user
core. My expectation was in this scenario that all the permgen associated
with that user cores would be freed upon it's unload and allow permgen to
reclaim that memory during a garbage collection. This was not the case, it
would constantly go up until the application would exhaust the memory.

I also investigated whether the there was a connection between the two
cores left behind because I was joining them together in a query but even
unloading the database core after unloading all the user cores won't
prevent the limit from being hit or any memory to be garbage collected from
Solr.

Is this a known issue with creating and unloading a large number of cores?
Could it be configuration based for the core? Is there something other than
unloading that needs to happen to free the references?

Thanks

Notes: I've tried using tools to determine if it's a leak within Solr such
as Plumbr and my activities turned up nothing.

Re: Solr cloud: Faceting issue on text field

2014-02-26 Thread Greg Walters

IIRC faceting uses copious amounts of memory; have you checked for GC activity 
while the query is running?

Thanks,
Greg

On Feb 26, 2014, at 1:06 PM, David Miller  wrote:

> Hi,
> 
> I am encountering an issue where Solr nodes goes down when trying to obtain
> facets on a text field. The cluster consists of a few servers and have
> around 200 million documents (small to medium). I am trying the faceting
> first time on this field and it gives a 502 Bad Gateway error along with
> some of the nodes going down and solr getting generally slow.
> 
> The text field can have few words to a few thousand words. The Solr version
> we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the
> logs, Zookeeper was giving an EndOfStreamException
> 
> Any hint on this will be helpful.
> 
> Thanks & Regards,

Solr cloud: Faceting issue on text field

2014-02-26 Thread David Miller

Hi,

I am encountering an issue where Solr nodes goes down when trying to obtain
facets on a text field. The cluster consists of a few servers and have
around 200 million documents (small to medium). I am trying the faceting
first time on this field and it gives a 502 Bad Gateway error along with
some of the nodes going down and solr getting generally slow.

The text field can have few words to a few thousand words. The Solr version
we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the
logs, Zookeeper was giving an EndOfStreamException

Any hint on this will be helpful.

Thanks & Regards,

Re: SolrCloud: How to replicate shard of another machine for failover?

2014-02-26 Thread Furkan KAMACI

Hi;

As Daniel mentioned it is just for "first time" and not a suggested
approach. However if you follow that way you can assign shards to machines.
On the other hand you can not change it after a time later with same
procedure.

Thanks;
Furkan KAMACI


2014-02-26 15:53 GMT+02:00 Daniel Collins :

> This is only true the *first* time you start the cluster.  As mentioned
> earlier, the correct way to assign shards to cores is to use the collection
> API.  Failing that, you can start cores in a determined order, and the
> cores will assign themselves a shard/replica when they first start up.
>  From that point on, that mapping is defined in clusterstate.json, and will
> persist until you change it (delete cluster state or use collection/core
> API to move/remove a core.  It is a kludgy approach, that's why generally
> it isn't recommended for new starters to use, but by starting the first
> cores in a particular order you can get exactly the distribution you want.
>
> The collection API is good generally because it has some logic to
> distribute shards across machines, but you can't be very specific with it,
> you can't say "I want shard 1 on machine A, and its replicas on machines b,
> c & d). So we use the "start order" mechanism for our production systems,
> because we want to place shards on specific machines., We have 256 shards,
> so we want to know exactly what set of cores & machines is required in
> order to have a "full collection" of data.  As long as you are aware of the
> limitations of each mechanism, both work.
>
>
> On 26 February 2014 10:26, Oliver Schrenk 
> wrote:
>
> > > There is a round robin process when assigning nodes at cluster. If you
> > want
> > > to achieve what you want you should change your Solr start up order.
> >
> > Well that is just weird. To bring a cluster to a reproducible state, I
> > have to bring the whole cluster down, and start it up again in a specific
> > order?
> >
> > What order do you suggest, to have a failover mechanism?
>

Re: Format of the spellcheck.q used to get suggestions in current filter

2014-02-26 Thread Hakim Benoudjit

I'm afraid I have to manually retreive all docs for suggested query in
current filter (category:Cars&q=Renau) and count them to get the frequency
in given filter.


2014-02-26 19:09 GMT+01:00 Hakim Benoudjit :

> It seems that suggestion frequency stays the same with filter query (fq).
>
>
> 2014-02-26 19:05 GMT+01:00 Ahmet Arslan :
>
>
>>
>> Just a guess, what happens when you use filter query?
>> fq=category:Cars&q=Renau
>>
>>
>>
>> On Wednesday, February 26, 2014 7:38 PM, Hakim Benoudjit <
>> h.benoud...@gmail.com> wrote:
>> I mean that: I want suggestions frequency to count only document in
>> current
>> query (solr 'q'). My issue is even if suggestion 'word' is correct; the
>> frequency is relative to all index and not only to the current query.
>> Suppose that I have 'q = category:Cars', in this case, if my searched
>> query
>> is 'Renau' (for cars model), suggestions frequence should only count cars
>> having the name 'Renault', not persons
>>
>>
>>
>> 2014-02-26 18:07 GMT+01:00 Ahmet Arslan :
>>
>> > Hi,
>> >
>> > What do you mean by "suggestions only for current category" ? Do you
>> mean
>> > that suggested word(s) should return non-zero hits for that category?
>> >
>> > Ahmet
>> >
>> >
>> >
>> > On Wednesday, February 26, 2014 6:36 PM, Hakim Benoudjit <
>> > h.benoud...@gmail.com> wrote:
>> > @Jack Krupansky, here is the important portion of my solrconfig.xml:
>> >
>> > 
>> >   default
>> >   title
>> >   solr.DirectSolrSpellChecker
>> >   
>> >   internal
>> >   
>> >   0.5
>> >   
>> >   2
>> >   
>> >   1
>> >   
>> >   5
>> >   
>> >   4
>> >   
>> >   0.01
>> >   
>> > 
>> >
>> > As you guess 'title' field is the one I'm searching & the one I'm
>> building
>> > my suggestions from.
>> >
>> > @Ahmet Arsian: I understand that `spellcheck.q` doesnt resolve my
>> issues,
>> > cause I want to get suggestions only for current category.
>> >
>> >
>> >
>> > 2014-02-26 17:07 GMT+01:00 Ahmet Arslan :
>> >
>> > > Hi Hakim,
>> > >
>> > > According to wiki spellcheck.q is intended to use with 'spelling
>> ready'
>> > > query/input.
>> > > 'spelling ready' means it does not contain field names, AND, OR, etc.
>> > > Something like should work. spellcheck.q=value1
>> value2&q=+field1:value1
>> > > +field2:value2
>> > >
>> > > Ahmet
>> > >
>> > >
>> > > On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit <
>> > > h.benoud...@gmail.com> wrote:
>> > > I have some difficulties to use `spellcheck.q` to get only suggestions
>> > for
>> > > current query.
>> > >
>> > > When I set `spellcheck.q` to lucene query format (field1:value1 AND
>> > > field2:value2), it doesnt return me any result.
>> > >
>> > > I have supposed that the value stored in `spellcheck.q` is just the
>> value
>> > > of ``spellcheck` component default field, but it returns an error in
>> this
>> > > case.
>> > >
>> > > Any help please?
>> > >
>> > >
>> >
>> >
>>
>>
>

Re: Format of the spellcheck.q used to get suggestions in current filter

2014-02-26 Thread Hakim Benoudjit

It seems that suggestion frequency stays the same with filter query (fq).


2014-02-26 19:05 GMT+01:00 Ahmet Arslan :

>
>
> Just a guess, what happens when you use filter query?
> fq=category:Cars&q=Renau
>
>
>
> On Wednesday, February 26, 2014 7:38 PM, Hakim Benoudjit <
> h.benoud...@gmail.com> wrote:
> I mean that: I want suggestions frequency to count only document in current
> query (solr 'q'). My issue is even if suggestion 'word' is correct; the
> frequency is relative to all index and not only to the current query.
> Suppose that I have 'q = category:Cars', in this case, if my searched query
> is 'Renau' (for cars model), suggestions frequence should only count cars
> having the name 'Renault', not persons
>
>
>
> 2014-02-26 18:07 GMT+01:00 Ahmet Arslan :
>
> > Hi,
> >
> > What do you mean by "suggestions only for current category" ? Do you mean
> > that suggested word(s) should return non-zero hits for that category?
> >
> > Ahmet
> >
> >
> >
> > On Wednesday, February 26, 2014 6:36 PM, Hakim Benoudjit <
> > h.benoud...@gmail.com> wrote:
> > @Jack Krupansky, here is the important portion of my solrconfig.xml:
> >
> > 
> >   default
> >   title
> >   solr.DirectSolrSpellChecker
> >   
> >   internal
> >   
> >   0.5
> >   
> >   2
> >   
> >   1
> >   
> >   5
> >   
> >   4
> >   
> >   0.01
> >   
> > 
> >
> > As you guess 'title' field is the one I'm searching & the one I'm
> building
> > my suggestions from.
> >
> > @Ahmet Arsian: I understand that `spellcheck.q` doesnt resolve my issues,
> > cause I want to get suggestions only for current category.
> >
> >
> >
> > 2014-02-26 17:07 GMT+01:00 Ahmet Arslan :
> >
> > > Hi Hakim,
> > >
> > > According to wiki spellcheck.q is intended to use with 'spelling ready'
> > > query/input.
> > > 'spelling ready' means it does not contain field names, AND, OR, etc.
> > > Something like should work. spellcheck.q=value1 value2&q=+field1:value1
> > > +field2:value2
> > >
> > > Ahmet
> > >
> > >
> > > On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit <
> > > h.benoud...@gmail.com> wrote:
> > > I have some difficulties to use `spellcheck.q` to get only suggestions
> > for
> > > current query.
> > >
> > > When I set `spellcheck.q` to lucene query format (field1:value1 AND
> > > field2:value2), it doesnt return me any result.
> > >
> > > I have supposed that the value stored in `spellcheck.q` is just the
> value
> > > of ``spellcheck` component default field, but it returns an error in
> this
> > > case.
> > >
> > > Any help please?
> > >
> > >
> >
> >
>
>

Re: Format of the spellcheck.q used to get suggestions in current filter

2014-02-26 Thread Ahmet Arslan



Just a guess, what happens when you use filter query? fq=category:Cars&q=Renau



On Wednesday, February 26, 2014 7:38 PM, Hakim Benoudjit 
 wrote:
I mean that: I want suggestions frequency to count only document in current
query (solr 'q'). My issue is even if suggestion 'word' is correct; the
frequency is relative to all index and not only to the current query.
Suppose that I have 'q = category:Cars', in this case, if my searched query
is 'Renau' (for cars model), suggestions frequence should only count cars
having the name 'Renault', not persons



2014-02-26 18:07 GMT+01:00 Ahmet Arslan :

> Hi,
>
> What do you mean by "suggestions only for current category" ? Do you mean
> that suggested word(s) should return non-zero hits for that category?
>
> Ahmet
>
>
>
> On Wednesday, February 26, 2014 6:36 PM, Hakim Benoudjit <
> h.benoud...@gmail.com> wrote:
> @Jack Krupansky, here is the important portion of my solrconfig.xml:
>
> 
>   default
>   title
>   solr.DirectSolrSpellChecker
>   
>   internal
>   
>   0.5
>   
>   2
>   
>   1
>   
>   5
>   
>   4
>   
>   0.01
>   
> 
>
> As you guess 'title' field is the one I'm searching & the one I'm building
> my suggestions from.
>
> @Ahmet Arsian: I understand that `spellcheck.q` doesnt resolve my issues,
> cause I want to get suggestions only for current category.
>
>
>
> 2014-02-26 17:07 GMT+01:00 Ahmet Arslan :
>
> > Hi Hakim,
> >
> > According to wiki spellcheck.q is intended to use with 'spelling ready'
> > query/input.
> > 'spelling ready' means it does not contain field names, AND, OR, etc.
> > Something like should work. spellcheck.q=value1 value2&q=+field1:value1
> > +field2:value2
> >
> > Ahmet
> >
> >
> > On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit <
> > h.benoud...@gmail.com> wrote:
> > I have some difficulties to use `spellcheck.q` to get only suggestions
> for
> > current query.
> >
> > When I set `spellcheck.q` to lucene query format (field1:value1 AND
> > field2:value2), it doesnt return me any result.
> >
> > I have supposed that the value stored in `spellcheck.q` is just the value
> > of ``spellcheck` component default field, but it returns an error in this
> > case.
> >
> > Any help please?
> >
> >
>
>

Re: Format of the spellcheck.q used to get suggestions in current filter

2014-02-26 Thread Hakim Benoudjit

I mean that: I want suggestions frequency to count only document in current
query (solr 'q'). My issue is even if suggestion 'word' is correct; the
frequency is relative to all index and not only to the current query.
Suppose that I have 'q = category:Cars', in this case, if my searched query
is 'Renau' (for cars model), suggestions frequence should only count cars
having the name 'Renault', not persons


2014-02-26 18:07 GMT+01:00 Ahmet Arslan :

> Hi,
>
> What do you mean by "suggestions only for current category" ? Do you mean
> that suggested word(s) should return non-zero hits for that category?
>
> Ahmet
>
>
>
> On Wednesday, February 26, 2014 6:36 PM, Hakim Benoudjit <
> h.benoud...@gmail.com> wrote:
> @Jack Krupansky, here is the important portion of my solrconfig.xml:
>
> 
>   default
>   title
>   solr.DirectSolrSpellChecker
>   
>   internal
>   
>   0.5
>   
>   2
>   
>   1
>   
>   5
>   
>   4
>   
>   0.01
>   
> 
>
> As you guess 'title' field is the one I'm searching & the one I'm building
> my suggestions from.
>
> @Ahmet Arsian: I understand that `spellcheck.q` doesnt resolve my issues,
> cause I want to get suggestions only for current category.
>
>
>
> 2014-02-26 17:07 GMT+01:00 Ahmet Arslan :
>
> > Hi Hakim,
> >
> > According to wiki spellcheck.q is intended to use with 'spelling ready'
> > query/input.
> > 'spelling ready' means it does not contain field names, AND, OR, etc.
> > Something like should work. spellcheck.q=value1 value2&q=+field1:value1
> > +field2:value2
> >
> > Ahmet
> >
> >
> > On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit <
> > h.benoud...@gmail.com> wrote:
> > I have some difficulties to use `spellcheck.q` to get only suggestions
> for
> > current query.
> >
> > When I set `spellcheck.q` to lucene query format (field1:value1 AND
> > field2:value2), it doesnt return me any result.
> >
> > I have supposed that the value stored in `spellcheck.q` is just the value
> > of ``spellcheck` component default field, but it returns an error in this
> > case.
> >
> > Any help please?
> >
> >
>
>

Search score problem using bf edismax

2014-02-26 Thread Ing. Andrea Vettori

Hi, I'm new to Solr and I'm trying to understand why I don't get what I want 
with the bf parameter.
The query debug information follows.
What I don't understand is why the result of the bf parameter is so low in 
score compared to matched fields.
Can anyone help ?
Thank you





  0
  19
  
true
true
iphone cavo
1393434305227
xml
3
  


  
125520
0125562
Carica batterie da auto con cavo riavvolgibile
CBR-AR I-Phone 1
Cellular Line
Cellulare - Cavo Accendisigari
Cellular Line CBR-AR I-Phone 1
true
true
IS107445|IP107261|ST300392|IG27586

P98720
1.0
1
22
22
0.22
0.22

15.9
0
2050-12-31T00:00:00Z
15.9
0.0
0.0
9.57
0
2020-12-31T00:00:00Z
10.24
9.95
0.0
2013-02-20T23:00:00Z

  ELDTEL003001001


  ELETEL03004


  9
  8

9
9
9
461945
1.0
1.0
A1
2011-06-05T22:00:00Z
2011-06-06T00:00:00Z
2013-10-21T00:00:00Z
12
9
9
930437
1.0
1.0
A1
2013-01-10T23:00:00Z
2013-01-11T00:00:00Z
2013-10-21T00:00:00Z
18
0

  C_0125562
  CAT_607
  M_186
  CM_607|186
  CAR_14956


  CAR_14956
  CAR_14952

SIMILE
1461102968118968320
2014-02-26T12:06:14.32Z
  
167951
0167435
Carica batteria da auto dedicato iPhone 5
CBR-MFIPH5W-Phone 5
Cellular Line
Cellulare - Cavo Accendisigari
Cellular Line CBR-MFIPH5W-Phone 5
Cellulare - Cavo Accendisigari|Dedicato apple 
light
true
true
IS174019|IP173834|ST135516|IG98795

P135190
1.0
1
22
22
0.22
0.22

18.02
0
2020-12-31T00:00:00Z
19.27
18.71
0.0
2013-02-20T23:00:00Z
24.9
0
2050-12-31T00:00:00Z
24.9
0.0
0.0

  ELDTEL003001001


  ELETEL03004


  9
  8

9
9
9
816069
1.0
1.0
A1
2012-12-05T23:00:00Z
2012-12-06T00:00:00Z
2013-10-21T00:00:00Z
9
9
9
941785
1.0
1.0
A1
2013-01-10T23:00:00Z
2013-01-11T00:00:00Z
2013-10-21T00:00:00Z
65
0

  C_0167435
  CAT_607
  M_186
  CM_607|186
  CAR_14957


  CAR_14957
  CAR_14952

SIMILE
1461103051247976448
2014-02-26T12:07:33.597Z
  
167185
0166678
Caricabatteria da auto per Apple IPHONE 5/IPAD 
MINI
K39757EU
Kensington
Cellulare - Cavo Accendisigari
Kensington K39757EU
Cellulare - Cavo Accendisigari|Dedicato apple 
light
true
true
IS171490|IP171305|ST133668|IG96264

P134418
1.0
1
22
22
0.22
0.22

19.9
0
2050-12-31T00:00:00Z
19.9
22.9
0.0
2014-02-06T23:00:00Z
14.03
0
2020-12-31T00:00:00Z
15.01
16.16
0.0
2013-05-28T22:00:00Z

  ELDTEL003001001


  ELETEL03004


  9
  8

9
9
9
814053
1.0
1.0
A1
2012-11-22T23:00:00Z
2012-11-23T00:00:00Z
2014-02-13T00:00:00Z
8
9
9
941453
1.0
1.0
A1
2013-01-10T23:00:00Z
2013-01-11T00:00:00Z
2014-02-13T00:00:00Z
76
0

  C_0166678
  M_537
  CAT_607
  CM_607|537
  CAR_14957


  CAR_14957
  CAR_14952

SIMILE
1461103049458057216
2014-02-26T12:07:31.891Z


  
  
  
  


  
true
  


  iphone cavo
  iphone cavo
  (+((DisjunctionMaxQuery((categoria_s:iphone | 
titolo:iphone^2.0 | descrizione:iphon^0.5 | marchio_s:iphone | 
modello_s:iphone)) DisjunctionMaxQuery((categoria_s:cavo | titolo:cavo^2.0 | 
descrizione:cav^0.5 | marchio_s:cavo | modello_s:cavo)))~2) 
FunctionQuery(1.0/(1.0E-9*float(int(rank1_8))+1.0)))/no_coord
  +(((categoria_s:iphone | titolo:iphone^2.0 | 
descrizione:iphon^0.5 | marchio_s:iphone | modello_s:iphone) (categoria_s:cavo 
| titolo:cavo^2.0 | descrizione:cav^0.5 | marchio_s:cavo | modello_s:cavo))~2) 
1.0/(1.0E-9*float(int(rank1_8))+1.0)
  

0.8545726 = (MATCH) sum of:
  0.82827055 = (MATCH) sum of:
0.33939165 = (MATCH) max of:
  0.33939165 = (MATCH) weight(modello_s:iphone in 24160) 
[DefaultSimilarity], result of:
0.33939165 = score(doc=24160,freq=1.0 = termFreq=1.0
), product of:
  0.21819489 = queryWeight, product of:
8.295743 = idf(docFreq=170, maxDocs=252056)
0.02630203 = queryNorm
  1.5554519 = fieldWeight in 24160, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
8.295743 = idf(docFreq=170, maxDocs=252056)
0.1875 = fieldNorm(doc=24160)
0.48887888 = (MATCH) max of:
  0.

Re: Format of the spellcheck.q used to get suggestions in current filter

2014-02-26 Thread Ahmet Arslan

Hi,

What do you mean by "suggestions only for current category" ? Do you mean that 
suggested word(s) should return non-zero hits for that category?

Ahmet



On Wednesday, February 26, 2014 6:36 PM, Hakim Benoudjit 
 wrote:
@Jack Krupansky, here is the important portion of my solrconfig.xml:


  default
  title
  solr.DirectSolrSpellChecker
  
  internal
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  


As you guess 'title' field is the one I'm searching & the one I'm building
my suggestions from.

@Ahmet Arsian: I understand that `spellcheck.q` doesnt resolve my issues,
cause I want to get suggestions only for current category.



2014-02-26 17:07 GMT+01:00 Ahmet Arslan :

> Hi Hakim,
>
> According to wiki spellcheck.q is intended to use with 'spelling ready'
> query/input.
> 'spelling ready' means it does not contain field names, AND, OR, etc.
> Something like should work. spellcheck.q=value1 value2&q=+field1:value1
> +field2:value2
>
> Ahmet
>
>
> On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit <
> h.benoud...@gmail.com> wrote:
> I have some difficulties to use `spellcheck.q` to get only suggestions for
> current query.
>
> When I set `spellcheck.q` to lucene query format (field1:value1 AND
> field2:value2), it doesnt return me any result.
>
> I have supposed that the value stored in `spellcheck.q` is just the value
> of ``spellcheck` component default field, but it returns an error in this
> case.
>
> Any help please?
>
>

Re: Format of the spellcheck.q used to get suggestions in current filter

2014-02-26 Thread Hakim Benoudjit

@Jack Krupansky, here is the important portion of my solrconfig.xml:


  default
  title
  solr.DirectSolrSpellChecker
  
  internal
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  


As you guess 'title' field is the one I'm searching & the one I'm building
my suggestions from.

@Ahmet Arsian: I understand that `spellcheck.q` doesnt resolve my issues,
cause I want to get suggestions only for current category.


2014-02-26 17:07 GMT+01:00 Ahmet Arslan :

> Hi Hakim,
>
> According to wiki spellcheck.q is intended to use with 'spelling ready'
> query/input.
> 'spelling ready' means it does not contain field names, AND, OR, etc.
> Something like should work. spellcheck.q=value1 value2&q=+field1:value1
> +field2:value2
>
> Ahmet
>
>
> On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit <
> h.benoud...@gmail.com> wrote:
> I have some difficulties to use `spellcheck.q` to get only suggestions for
> current query.
>
> When I set `spellcheck.q` to lucene query format (field1:value1 AND
> field2:value2), it doesnt return me any result.
>
> I have supposed that the value stored in `spellcheck.q` is just the value
> of ``spellcheck` component default field, but it returns an error in this
> case.
>
> Any help please?
>
>

Re: Format of the spellcheck.q used to get suggestions in current filter

2014-02-26 Thread Ahmet Arslan

Hi Hakim,

According to wiki spellcheck.q is intended to use with 'spelling ready' 
query/input. 
'spelling ready' means it does not contain field names, AND, OR, etc.
Something like should work. spellcheck.q=value1 value2&q=+field1:value1 
+field2:value2

Ahmet


On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit 
 wrote:
I have some difficulties to use `spellcheck.q` to get only suggestions for
current query.

When I set `spellcheck.q` to lucene query format (field1:value1 AND
field2:value2), it doesnt return me any result.

I have supposed that the value stored in `spellcheck.q` is just the value
of ``spellcheck` component default field, but it returns an error in this
case.

Any help please?

Re: Format of the spellcheck.q used to get suggestions in current filter

2014-02-26 Thread Jack Krupansky

Could you post the request URL and the XML/JSON Solr response? And the 
solrconfig for both the query request handler and the spellcheck component.


Is your spell check component configured for both fields, field1 and field2?

-- Jack Krupansky

-Original Message- 
From: Hakim Benoudjit

Sent: Wednesday, February 26, 2014 10:50 AM
To: solr-user@lucene.apache.org
Subject: Format of the spellcheck.q used to get suggestions in current 
filter


I have some difficulties to use `spellcheck.q` to get only suggestions for
current query.

When I set `spellcheck.q` to lucene query format (field1:value1 AND
field2:value2), it doesnt return me any result.

I have supposed that the value stored in `spellcheck.q` is just the value
of ``spellcheck` component default field, but it returns an error in this
case.

Any help please?

Format of the spellcheck.q used to get suggestions in current filter

2014-02-26 Thread Hakim Benoudjit

I have some difficulties to use `spellcheck.q` to get only suggestions for
current query.

When I set `spellcheck.q` to lucene query format (field1:value1 AND
field2:value2), it doesnt return me any result.

I have supposed that the value stored in `spellcheck.q` is just the value
of ``spellcheck` component default field, but it returns an error in this
case.

Any help please?

[ANNOUNCE] Apache Solr 4.7.0 released.

2014-02-26 Thread Simon Willnauer

February 2014, Apache Solr™ 4.7 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.7

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.7 is available for immediate download at:
  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the CHANGES.txt file included with the release for a full list of
details.

Solr 4.7 Release Highlights:

* A new 'migrate' collection API to split all documents with a route key
  into another collection.

* Added support for tri-level compositeId routing.

* Admin UI - Added a new "Files" conf directory browser/file viewer.

* Add a QParserPlugin for Lucene's SimpleQueryParser.

* Suggest improvements: a new SuggestComponent that fully utilizes the
  Lucene suggester module; queries can now use multiple suggesters;
  Lucene's FreeTextSuggester and BlendedInfixSuggester are now supported.

* New 'cursorMark' request param for efficient deep paging of sorted
  result sets. See http://s.apache.org/cursorpagination

* Add a Solr contrib that allows for building Solr indexes via Hadoop's
  MapReduce.

* Upgrade to Spatial4j 0.4. Various new options are now exposed
  automatically for an RPT field type.  See Spatial4j CHANGES & javadocs.
  https://github.com/spatial4j/spatial4j/blob/master/CHANGES.md

* SSL support for SolrCloud.

Solr 4.7 also includes many other new features as well as numerous
optimizations and bugfixes.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases.  It is possible that the mirror you are using
may not have replicated the release yet.  If that is the case, please
try another mirror.  This also goes for Maven access.

Re: SolrCloud: How to replicate shard of another machine for failover?

2014-02-26 Thread Daniel Collins

This is only true the *first* time you start the cluster.  As mentioned
earlier, the correct way to assign shards to cores is to use the collection
API.  Failing that, you can start cores in a determined order, and the
cores will assign themselves a shard/replica when they first start up.
 From that point on, that mapping is defined in clusterstate.json, and will
persist until you change it (delete cluster state or use collection/core
API to move/remove a core.  It is a kludgy approach, that's why generally
it isn't recommended for new starters to use, but by starting the first
cores in a particular order you can get exactly the distribution you want.

The collection API is good generally because it has some logic to
distribute shards across machines, but you can't be very specific with it,
you can't say "I want shard 1 on machine A, and its replicas on machines b,
c & d). So we use the "start order" mechanism for our production systems,
because we want to place shards on specific machines., We have 256 shards,
so we want to know exactly what set of cores & machines is required in
order to have a "full collection" of data.  As long as you are aware of the
limitations of each mechanism, both work.

On 26 February 2014 10:26, Oliver Schrenk  wrote:

> > There is a round robin process when assigning nodes at cluster. If you
> want
> > to achieve what you want you should change your Solr start up order.
>
> Well that is just weird. To bring a cluster to a reproducible state, I
> have to bring the whole cluster down, and start it up again in a specific
> order?
>
> What order do you suggest, to have a failover mechanism?

Re: programmatically disable/enable solr queryResultCache...

2014-02-26 Thread Senthilnathan Vijayaraja

Shalin,

   Great,Thanks for the clear explanation. let me try to make my scoring
function as part of QueryResultKey.

Thanks & Regards,
Senthilnathan V


On Wed, Feb 26, 2014 at 5:40 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> The problem here is that your custom scoring function (is that a
> SearchComponent?) is not part of a query. The query cache is defined
> as SolrCache where the QueryResultKey contains
> Query, Sort, SortField[] and filters=List. So your custom
> scoring function either needs to be present in the QueryResultKey or
> else you need to disable the query result cache via configuration.
>
> On Wed, Feb 26, 2014 at 12:09 PM, Senthilnathan Vijayaraja
>  wrote:
> > Erick,
> >  Thanks for the response.
> >
> > Kindly have a look at my sample query,
> >
> > select?fl=city,$score&q=*:*&fq={!lucene q.op=OR df=city
> v=$cit}&cit=Chennai&
> >
> > *sort=$score desc& score=norm($la,value,10)& la=8 &b=1&c=2*here,
> > score= norm($la,value,10), norm  is a custom function
> >
> > *,if I change la then the $score will change.*
> > first time it work fine but if I am changing la alone and firing the
> query
> > again the result remains in the same order as first query result.Which
> > means sorting is not happening even the score is different.But If I am
> > changing the cit=Chennai to cit=someCity  then I am getting result in
> > proper order,means sorting works fine.
> >
> > At any rate, queryResultCache is unlikely to impact
> > much. All it is is
> > *a map containing the query and the first few document IDs *(internal
> > Lucene).
> >
> > which means query is the unique key and list of document ids are values
> > mapped with that key.If I am not wrong,
> >
> > may I know how solr builds the unique keys based on the queries.
> >
> > Whether it builds the key based on only solr common query parameters or
> it
> > will include all the parameters supplied by user as part of query(for e.g
> > la=8&b=1&c=2 ).
> >
> >
> > any clue?
> >
> >
> > Thanks & Regards,
> > Senthilnathan V
> >
> >
> > On Tue, Feb 25, 2014 at 8:00 PM, Erick Erickson  >wrote:
> >
> >> This seems like an XY problem, you're asking for
> >> specifics on doing something without any indication
> >> _why_ you think this would help. Nor are you explaining
> >> what the problem you're having is in the first place.
> >>
> >> At any rate, queryResultCache is unlikely to impact
> >> much. All it is is a map containing the query and
> >> the first few document IDs (internal Lucene). See
> >>  in solrconfig.xml. It is
> >> quite light-weight, it does NOT store the entire
> >> result set, nor even the contents of the documents.
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Tue, Feb 25, 2014 at 6:07 AM, Senthilnathan Vijayaraja <
> >> senthilnat...@8kmiles.com> wrote:
> >>
> >> > is there any way programmatically disable/enable solr
> queryResultCache?
> >> >
> >> > I am using SolrJ.
> >> >
> >> >
> >> > Thanks & Regards,
> >> > Senthilnathan V
> >> >
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Cluster state ranges are all null after reboot

2014-02-26 Thread Shalin Shekhar Mangar

If you have 15 shards and assuming that you've never used shard
splitting, you can calculate the shard ranges by using new
CompositeIdRouter().partitionRange(15, new
CompositeIdRouter().fullRange())

This gives me:
[8000-9110, 9111-a221, a222-b332,
b333-c443, c444-d554, d555-e665,
e666-f776, f777-887, 888-1998,
1999-2aa9, 2aaa-3bba, 3bbb-4ccb,
4ccc-5ddc, 5ddd-6eed, 6eee-7fff]

Have you done any more investigation into why this happened? Anything
strange in the logs? Are you able to reproduce this in a test
environment?

On Wed, Feb 19, 2014 at 5:16 AM, Greg Pendlebury
 wrote:
> We've got a 15 shard cluster spread across 3 hosts. This morning our puppet
> software rebooted them all and afterwards the 'range' for each shard has
> become null in zookeeper. Is there any way to restore this value short of
> rebuilding a fresh index?
>
> I've read various questions from people with a similar problem, although in
> those cases it is usually a single shard that has become null allowing them
> to infer what the value should be and manually fix it in ZK. In this case I
> have no idea what the ranges should be. This is our test cluster, and
> checking production I can see that the ranges don't appear to be
> predictable based on the shard number.
>
> I'm also not certain why it even occurred. Our test cluster only has a
> single replica per shard, so when a JVM is rebooted the cluster is
> unavailable... would that cause this? Production has 3 replicas so we can
> do rolling reboots.



-- 
Regards,
Shalin Shekhar Mangar.

Function Query does not work properly

2014-02-26 Thread Jan Wedding


Hi,

I have a small problem using function queries. According to 
http://wiki.apache.org/solr/FunctionQuery#Date_Boosting and 
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents 
I've tried using function queries to boost newer documents over older 
ones. For my case, I have documents with dates in the future, so I tried 
to adapt the example: All dates in the future should have a boost 
multiplier of 1. Therefore, I tried using the following function, so all 
dates until 100 years in the future should get a 0 through the map 
function, and all past dates should end up being used as they are after 
the map function, resulting in a boost multiplier of 1 for all future 
dates, and all past dates having the normal values of their age 
according to the recip function:


recip(map(product(ms(NOW,date_field),3.16e-11),-100,0,0),3.16e-11,1,1)

Unfortunately, this does not seem to work - this function seems to 
return 1 for any date_field value


After that, I tried using a workaround, by emulating the recip function 
using the div and product and sum functions:

div(1,sum(product(map(product(ms(NOW,date_field),3.16e-11),-100,0,0),3.16e-11),1))

This also did not work .

Finally I checked, whether the map function returns correct values, by 
executing the map function alone, so all future days should end up 
having 0 for their score. This DID work.

map(product(ms(NOW,date_field),3.16e-11),-100,0,0)

So my question now: Am I doing something wrong or is there a bug in the 
recip function?


I am currently using Solr 4.5.1.

Thanks for your help,
Jan

Re: programmatically disable/enable solr queryResultCache...

2014-02-26 Thread Shalin Shekhar Mangar

The problem here is that your custom scoring function (is that a
SearchComponent?) is not part of a query. The query cache is defined
as SolrCache where the QueryResultKey contains
Query, Sort, SortField[] and filters=List. So your custom
scoring function either needs to be present in the QueryResultKey or
else you need to disable the query result cache via configuration.

On Wed, Feb 26, 2014 at 12:09 PM, Senthilnathan Vijayaraja
 wrote:
> Erick,
>  Thanks for the response.
>
> Kindly have a look at my sample query,
>
> select?fl=city,$score&q=*:*&fq={!lucene q.op=OR df=city v=$cit}&cit=Chennai&
>
> *sort=$score desc& score=norm($la,value,10)& la=8 &b=1&c=2*here,
> score= norm($la,value,10), norm  is a custom function
>
> *,if I change la then the $score will change.*
> first time it work fine but if I am changing la alone and firing the query
> again the result remains in the same order as first query result.Which
> means sorting is not happening even the score is different.But If I am
> changing the cit=Chennai to cit=someCity  then I am getting result in
> proper order,means sorting works fine.
>
> At any rate, queryResultCache is unlikely to impact
> much. All it is is
> *a map containing the query and the first few document IDs *(internal
> Lucene).
>
> which means query is the unique key and list of document ids are values
> mapped with that key.If I am not wrong,
>
> may I know how solr builds the unique keys based on the queries.
>
> Whether it builds the key based on only solr common query parameters or it
> will include all the parameters supplied by user as part of query(for e.g
> la=8&b=1&c=2 ).
>
>
> any clue?
>
>
> Thanks & Regards,
> Senthilnathan V
>
>
> On Tue, Feb 25, 2014 at 8:00 PM, Erick Erickson 
> wrote:
>
>> This seems like an XY problem, you're asking for
>> specifics on doing something without any indication
>> _why_ you think this would help. Nor are you explaining
>> what the problem you're having is in the first place.
>>
>> At any rate, queryResultCache is unlikely to impact
>> much. All it is is a map containing the query and
>> the first few document IDs (internal Lucene). See
>>  in solrconfig.xml. It is
>> quite light-weight, it does NOT store the entire
>> result set, nor even the contents of the documents.
>>
>> Best
>> Erick
>>
>>
>> On Tue, Feb 25, 2014 at 6:07 AM, Senthilnathan Vijayaraja <
>> senthilnat...@8kmiles.com> wrote:
>>
>> > is there any way programmatically disable/enable solr queryResultCache?
>> >
>> > I am using SolrJ.
>> >
>> >
>> > Thanks & Regards,
>> > Senthilnathan V
>> >
>>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Knowing shard value of result

2014-02-26 Thread Shalin Shekhar Mangar

Ah, I didn't know that this is possible with DocTransformers. This is
also possible in Solr 4.7 (to be released soon) by using
shards.info=true in the request.

On Wed, Feb 26, 2014 at 2:32 PM, Ahmet Arslan  wrote:
> Hi,
>
> I think with this : https://wiki.apache.org/solr/DocTransformers#A.5Bshard.5D
>
> Ahmet
>
>
>
> On Wednesday, February 26, 2014 10:36 AM, search engn dev 
>  wrote:
> I have setup solr cloud of two shards and two replicas. I am using solrj for
> communicating with solr. We are using CloudSolrServer  for searching in solr
> cloud.  below is my code
>
> String zkHost =
> "host1:2181,host1:2182,host1:2183,host1:2184,host1:2185";
> CloudSolrServer server = new CloudSolrServer(zkHost);
> server.connect();
> server.setDefaultCollection(defaultCollection);
> server.setIdField("Id");
> SolrQuery parameters = new SolrQuery();
> parameters.set("q","*:*");
> QueryResponse response = server.query(parameters);
> System.out.println(""+response.toString());
>
> I am getting correct  response from solr. But how do i know the requested
> solr hosts. ? because request can go to any live solr host.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Knowing-shard-value-of-result-tp4119713.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: concurrentlinkedhashmap 1.2 vs 1.4

2014-02-26 Thread Guido Medina


I think it would need Guava v16.0.1 to benefit from the ported code.

Guido.

On 26/02/14 11:20, Guido Medina wrote:
As notes also stated at concurrentlinkedhashmap v1.4, the performance 
changes were ported to Guava (don't know to what version to be 
honest), so, wouldn't be better to use MapMaker builder?


Regards,

Guido.

On 26/02/14 11:15, Guido Medina wrote:

Hi,

I noticed Solr is using concurrentlinkedhashmap v1.2 which is for 
Java 5, according to notes at 
https://code.google.com/p/concurrentlinkedhashmap/ version 1.4 has 
performance improvements compared to v1.2, isn't Solr 4.x designed 
against Java 6+? If so, wouldn't it benefit from v1.4?


Regards,

Guido.

Re: concurrentlinkedhashmap 1.2 vs 1.4

2014-02-26 Thread Guido Medina

As notes also stated at concurrentlinkedhashmap v1.4, the performance 
changes were ported to Guava (don't know to what version to be honest), 
so, wouldn't be better to use MapMaker builder?


Regards,

Guido.

On 26/02/14 11:15, Guido Medina wrote:

Hi,

I noticed Solr is using concurrentlinkedhashmap v1.2 which is for Java 
5, according to notes at 
https://code.google.com/p/concurrentlinkedhashmap/ version 1.4 has 
performance improvements compared to v1.2, isn't Solr 4.x designed 
against Java 6+? If so, wouldn't it benefit from v1.4?


Regards,

Guido.

concurrentlinkedhashmap 1.2 vs 1.4

2014-02-26 Thread Guido Medina


Hi,

I noticed Solr is using concurrentlinkedhashmap v1.2 which is for Java 
5, according to notes at 
https://code.google.com/p/concurrentlinkedhashmap/ version 1.4 has 
performance improvements compared to v1.2, isn't Solr 4.x designed 
against Java 6+? If so, wouldn't it benefit from v1.4?


Regards,

Guido.

Re: SolrCloud: How to replicate shard of another machine for failover?

2014-02-26 Thread Oliver Schrenk

Hi,

> Don't run multiple instances of Solr on one machine.  Instead, run one
> instance per machine and create the collection with the maxShardsPerNode
> parameter set to 2 or whatever value you need. 

Ok.

> Yet another whole separate discussion: You need three physical nodes for
> a redundant zookeeper, but I see only one host (localhost) in your
> zkHost parameter.

I know, but thanks for pointing it out. At the moment I’m doing a proof of 
concept investigating SolrCloud. Properly configuring Zookeeper comes later.

> The way you've set it up, SolrCloud just sees that you have four Solr
> instances.  It does not know that they are on the same machine.  As far
> as it is concerned, they are entirely separate.
> 
> Something that would be a good idea is an optional config flag that
> would make SolrCloud compare hostnames when building a collection and
> avoid putting replicas on nodes where the hostname matches.  Whether to
> default this option to on or off is a whole separate discussion.

That would be a great addition. Because as of now I don’t see a way of having a 
reproducible failover mechanism without additional physical machines.  Or am I 
wrong here?

Let’s say I have two leaders (host1, and host2) running, each with one shard of 
the collection running. How can I make sure that host1 will run a replica of 
host2 ?


Thanks
Oliver

Re: SolrCloud: How to replicate shard of another machine for failover?

2014-02-26 Thread Oliver Schrenk

> There is a round robin process when assigning nodes at cluster. If you want
> to achieve what you want you should change your Solr start up order.

Well that is just weird. To bring a cluster to a reproducible state, I have to 
bring the whole cluster down, and start it up again in a specific order?

What order do you suggest, to have a failover mechanism?

Re: Knowing shard value of result

2014-02-26 Thread search engn dev

Thanks iorixxx,

SolrQuery parameters = new SolrQuery();
parameters.set("q","*:*");
parameters.set("fl","Id,STATE_NAME,[shard]");
parameters.set("distrib","true");
QueryResponse response = server.query(parameters);

It's working fine now.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Knowing-shard-value-of-result-tp4119713p4119720.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Knowing shard value of result

2014-02-26 Thread Ahmet Arslan

Hi,

I think with this : https://wiki.apache.org/solr/DocTransformers#A.5Bshard.5D

Ahmet



On Wednesday, February 26, 2014 10:36 AM, search engn dev 
 wrote:
I have setup solr cloud of two shards and two replicas. I am using solrj for
communicating with solr. We are using CloudSolrServer  for searching in solr
cloud.  below is my code 

                String zkHost =
"host1:2181,host1:2182,host1:2183,host1:2184,host1:2185"; 
        CloudSolrServer server = new CloudSolrServer(zkHost);  
        server.connect();
        server.setDefaultCollection(defaultCollection);
        server.setIdField("Id");
        SolrQuery parameters = new SolrQuery();
        parameters.set("q","*:*");
        QueryResponse response = server.query(parameters);
        System.out.println(""+response.toString());

I am getting correct  response from solr. But how do i know the requested
solr hosts. ? because request can go to any live solr host. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Knowing-shard-value-of-result-tp4119713.html
Sent from the Solr - User mailing list archive at Nabble.com.

Knowing shard value of result

2014-02-26 Thread search engn dev

I have setup solr cloud of two shards and two replicas. I am using solrj for
communicating with solr. We are using CloudSolrServer  for searching in solr
cloud.  below is my code 

String zkHost =
"host1:2181,host1:2182,host1:2183,host1:2184,host1:2185"; 
CloudSolrServer server = new CloudSolrServer(zkHost);  
server.connect();
server.setDefaultCollection(defaultCollection);
server.setIdField("Id");
SolrQuery parameters = new SolrQuery();
parameters.set("q","*:*");
QueryResponse response = server.query(parameters);
System.out.println(""+response.toString());

I am getting correct  response from solr. But how do i know the requested
solr hosts. ? because request can go to any live solr host. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Knowing-shard-value-of-result-tp4119713.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need feedback: Browsing and searching solr-user list emails

2014-02-26 Thread Durgam Vahia

Hi Dmitry,

Thanks for your feedback. Couple of inline responses below.

On Mon, Feb 24, 2014 at 4:43 AM, Dmitry Kan  wrote:

> Hello!
>
> Just few random points:
>
> 1. Interesting site. I'd say there are similar sites, but this one has
> cleaner interface. How does your site compare to this one, for example, in
> terms of feature set?
>
> http://qnalist.com/questions/4640870/luke-4-6-0-released
>
> At least, the user ranking seems to be different, because on your site
> yours truly marked with 5800 points and on the qnalist with 59.
>
>
Looks like a similar idea. UI seems quite different though, as you
suggested - seems qnalist is removing all quoted text within emails. We
preserve it as it brings "context". Imagine inline responses showing up
without quoted text.

Seems it is missing "crowdsource" aspect also - votes, favorites, best
answers - which are very important for relevancy.

Might want to compare search results as well, particularly the "Related
questions" under each question. Being able to quickly navigate to similar
threads (like StackExchange) is a very powerful way to access content.

2. Do you handle several users, like DmitryKan, DmitryKan-1.. as a single
> user, i.e. if I'd post under different e-mail addresses.
>

Yes, but with administrator's intervention. We combine multiple name
identities associated under same email address (may be coming from
different email clients) but combining multiple emails addresses needs to
be done by admin.

> 3. It seems like your site is going to mostly be read only, except for
> question / user voting?
>

Yes, in a short-term. However, one can argue that solr-user type mailing
lists are Q&A anyways and SE like forum are better suited for this purpose
given they "organize" content little better compared to emails. So if
longer term solution for managing such community is Q&A then solution like
this "gently" moves people in that direction without asking them to
drastically change existing behaviors.

>
> To me any such site, including yours, will make sense as long as I could
> find stuff faster than with Google.
>

That's probably the key. Even with SE, Google lands you there but once you
are on SE, you navigate using its own search and recommendation engine etc.
It all boils down to the quality of search ranking and associated UI :)

Durgam.

>
> Dmitry Kan
>
>
>
>
>
> On Tue, Feb 11, 2014 at 7:18 AM, Durgam Vahia  wrote:
>
> > Hi Solr-users,
> >
> > I wanted to get your thoughts/feedback on a potentially useful way to
> > browse and search prior email conversations in
> > solr-users@lucenedistribution list.
> >
> > http://www.signaldump.org/solr/qpod/
> >
> > In a nutshell, this is a Q&A engine like StackExchange (SE)
> auto-populated
> > with solr-users@lucene email threads of past one year. Engine auto-tags
> > email threads and creates user profile of participants with points,
> badges
> > etc. New emails also gets processed automatically and will be placed
> under
> > the relevant conversation.
> >
> > Here are some of the advantages that might be useful -
> >
> >- Like SE, users can "crowdsource" the quality of content by voting,
> and
> >choosing best answers.
> >- You can favorite posts/threads, users, tags to personalize search.
> >- Email conversations and Q&A engine work seamlessly together. One can
> >use any medium and conversations are still presented in a uniform way.
> >- Web UI supports mobile device aspect ratios - just click on above
> link
> >on your mobile device to get a feel.
> >
> > Do you think this would be useful for the solr-users community? To get a
> > feel, try searching the archive before posting in the email list to see
> if
> > UI makes finding things little gentler. As more people search/view/vote,
> > search should become more relevant and personalized.
> >
> > I would be happy to maintain this for the benefit of the community.
> > Currently I have only seeded past one year of email but we could
> > potentially go further back if people find this useful.
> >
> > Thanks and feedback welcome.
> >
> > And before someone asks - yes, our search engine is Solr ..
> >
> > Durgam.
> >
>
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: twitter.com/dmitrykan
>

RE: Performance problem on Solr query on stemmed values

2014-02-26 Thread Erwin Gunadi

Hi Erick,

thank you for the reply.
Yes, I'm using the fast vector highlighter (Solr 4.3). Every request should
only deliver 10 results.

Here is my schema configuration on both field:

Field content contains in average around 5000 - 6000 words (only rough
estimation).

Best regards
Erwin

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, February 25, 2014 3:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance problem on Solr query on stemmed values

Right, highlighting may have to re-analyze the input in order to return the
highlighted data. This will be significantly slower than the search,
especially if you have a large number of rows you're returning.

You can get better performance in highlighting by using
FastVectorHighlighter. See:

https://cwiki.apache.org/confluence/display/solr/FastVector+Highlighter

1000x is unusual, though, unless your fields are very large or you're
returning a lot of documents.

Best,
Erick

On Tue, Feb 25, 2014 at 5:23 AM, Erwin Gunadi wrote:

> Hi,
>
>
>
> I would like to know whether anyone have experienced this kind of 
> phenomena.
>
>
>
> We are having performance problem regarding query on stemmed value.
>
> I've documented the symptoms which I'm currently facing:
>
>
>
>
> Search on field content
>
> Search on field spell
>
> Highlighting (on content field)
>
> Processing speed
>
>
> active
>
> active
>
> Active
>
> Slow
>
>
> active
>
> not active
>
> Active
>
> Fast
>
>
> active
>
> active
>
> not active
>
> Fast
>
>
> not active
>
> active
>
> Active
>
> Slow
>
>
> not active
>
> active
>
> not active
>
> Fast
>
>
>
> *Fast means 1000x faster than "slow".
>
>
>
> Field Content is our index field, which holds original text, and spell 
> is the field with stemmed value.
>
> According to my measurement result, search on both fields (stemmed and 
> not
> stemmed) is really fast.
>
> But when I start to take highlighting into our query it takes too long 
> to process.
>
>
>
> Best Regards
>
> Erwin
>
>