date:20131209

Hi Salman,
I am confused because with surround no analysis is applied at query time. I 
suspect that surround query parser is not kicking in. You should see SrndQuery 
or something like at parser query section.



On Monday, December 9, 2013 6:24 AM, Salman Akram 
salman.ak...@northbaysolutions.net wrote:
 
All,

I posted this sub-issue with another issue few days back but maybe it was
not obvious so posting it on a separate thread.

We recently migrated to SOLR 4.6. We use Common Grams but queries with
words in the CG list have slowed down. On debugging we found that for CG
words the parser is adding individual tokens of those words in the query
too which ends up slowing it. Below is an example:

Query = only be

Here is what debug shows. I have highlighted the red part which is
different in both versions i.e. SOLR 4.6 is making it a multiphrasequery
and adding individual tokens too. Can someone help?

SOLR 4.6 (takes 20 secs)
str name=rawquerystring{!surround}Contents:only be/str
str name=querystring{!surround}Contents:only be/str
str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str
str name=parsedquery_toStringContents:(only only_be) be/str

SOLR 1.4.1 (takes 1 sec)
str name=rawquerystring{!surround}Contents:only be/str
str name=querystring{!surround}Contents:only be/str
str name=parsedqueryContents:only_be/str
str name=parsedquery_toStringContents:only_be/str--


Regards,

Salman Akram

Re: Faceting within groups

Can you try setting group.truncate to true? 



On Sunday, December 8, 2013 3:18 PM, Cool Techi cooltec...@outlook.com wrote:
 
Any help here?


 From: cooltec...@outlook.com
 To: solr-user@lucene.apache.org
 Subject: Faceting within groups
 Date: Sat, 7 Dec 2013 14:00:20 +0530
 
 Hi,
 I am not sure if faceting with groups is supported, the documents do seem to 
 suggest it works, but cant seem to get the intended results.
 str name=q(Amazon Cloud OR (IBM Cloud)/strstr 
 name=group.fieldsourceId/strstr name=facet.fieldsentiment/strstr 
 name=grouptrue/strstr name=group.facettrue/str
 Also, if it work's does solr cloud support it.
 Regards,Ayush

Re: alternative to DisMaxRequestHandler needed for upgrade to solr 4.6.0

2013-12-09 Thread Peri Stracchino

thanks guys, that worked


On 6 December 2013 23:55, Shawn Heisey s...@elyograg.org wrote:

 On 12/6/2013 8:58 AM, Peri Stracchino wrote:

 I'm trying to upgrade a solr installation from 1.4 (yes, really) to 4.6.0,
 and I find our requesthandler was solr.DisMaxRequestHandler, which is now
 not only deprecated but deleted from solr-core-4.6.0.jar.  Can anyone
 advise on  suitable alternatives, or was there any form of direct
 replacement?


 Erick is right, you should probably use edismax.

 In addition, it's important to note a critical distinction here ... it's
 the *handler* object that's deprecated and removed, not the parser.

 The old dismax query parser is still alive and well, alongside the new
 extended dismax query parser.  You need to use a standard search request
 handler and set the defType parameter to dismax or edismax.

 http://wiki.apache.org/solr/DisMaxQParserPlugin
 http://wiki.apache.org/solr/ExtendedDisMax

 I would recommend that you not use /dismax or /edismax for the handler
 name, just to avoid terminology clashes.  I use /ncdismax for my handler
 name ... the string nc has meaning for our web application.  Eventually I
 hope to move all searching to edismax and therefore just use /select or
 /search for the handler name.  Right now we do almost everything with the
 standard query parser, and we are still tuning edismax.

 This is my handler definition:

 requestHandler name=/ncdismax class=solr.SearchHandler
   lst name=defaults
 str name=defTypeedismax/str
 str name=echoParamsall/str
 int name=rows70/int
 str name=dfcatchall/str
 xi:include href=shards.xml xmlns:xi=http://www.w3.org/
 2001/XInclude/
 str name=shards.qt/search/str
 str name=shards.infotrue/str
 str name=shards.toleranttrue/str
 float name=tie0.1/float
 int name=qs3/int
 int name=ps3/int
 str name=qfcatchall/str
 str name=pfcatchall^2/str
 str name=boostmin(recip(abs(ms(NOW/HOUR,pd)),1.92901e-10,1.5,
 1.5),0.85)/str
 str name=mm100%/str
 str name=q.alt*:*/str
 bool name=lowercaseOperatorsfalse/bool
   /lst
 /requestHandler

 Thanks,
 Shawn

Re: Constantly increasing time of full data import

2013-12-09 Thread michallos

on production - no I can't profile it (because of huge overhead) ... Maybe
with dynamic tracing but we can't do it right now.

After server restart, delta time reset to 15-20 seconds so it is not caused
by the mergeFactor.
We have SSD and 70GB RAM (it is enough for us).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4105658.html
Sent from the Solr - User mailing list archive at Nabble.com.

Indexing on plain text and binary data in a single HTTP POST request

Hi, 
I am using Solr for searching my email data. My application is in C++ so I a
using CURL library to POST the data to Solr for indexing. I am posting data
in XML format and some of the XML fields are in plain text and some of the
fields are in binary format. I want to know what should I do so that Solr
can index both types of data (plain text as well as binary data) coming in a
single XML file. 

For the reference my XML file looks like: 
adddocfield name=mailbox-id/fieldfield
name=folderINBOX/fieldfield name=fromsolr solr
s...@abc.com/fieldfield name=tosolr s...@abc.com/fieldfield
name=email-bodyHI I AM EMAIL BODY\r\n\r\nTHANKS/fieldfield
name=email-attachmentSome binary data/doc/add

I tried to use ExtractingUpdateProcessorFactory  but it seems to me that
ExtractingUpdateProcessorFactory support is not in Solr 4.5(which I am
using) even not in any of the Solr version available in market. 

Also, I think I can not use ExtractingRequestHandler for my problem as the
document is of type XML format and having mixed type of data(text and
binary). Am I right ?? If yes, pls. suggest me how to proceed and if no, how
can I  extract text using ExtractingRequestHandler from some of the binary
fields.

Any help is highly appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Difference between textfield and strfield

Hi Manju, 

Ahmet is me :) Faceting will be OK with lowercase field type. Even if it is 
solr.TextField. KeywordTokenizer keeps its input as a single token. Similar 
behavior as string field. With solr.TextField + KeywordTokenizer you can add 
further token filters. For example lowercase filter. With string type you 
cannot add any token filters.

As Erick suggested, you can play with field types at Admin analysis page. It 
allows you to enter sample text and displays generated tokens visually.






On Sunday, December 8, 2013 2:00 PM, manju16832003 manju16832...@gmail.com 
wrote:

I don't understand. Use the field type *Ahmet* recommended. Who is Ahmet?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Difference-between-textfield-and-strfield-tp3986916p4105570.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Constantly increasing time of full data import

2013-12-09 Thread Toke Eskildsen

On Mon, 2013-12-09 at 11:29 +0100, michallos wrote:
 on production - no I can't profile it (because of huge overhead) ... Maybe
 with dynamic tracing but we can't do it right now.

https://blogs.oracle.com/nbprofiler/entry/visualvm_1_3_released
First section: Sampler In The Core Tool.

 After server restart, delta time reset to 15-20 seconds so it is not caused
 by the mergeFactor.

Unless your merges are cascading so that the amount of concurrent merges
is growing. But with fast storage and a lot of RAM for write cache, that
does not sound probable.

 We have SSD and 70GB RAM (it is enough for us).

Sounds like more than enough for a 120GB index.

- Toke Eskildsen, State and University Library, Denmark

Re: Indexing on plain text and binary data in a single HTTP POST request

2013-12-09 Thread Alexandre Rafalovitch

Not a solution, but a couple of thoughts:
1) For your email address fields, you are escaping the brackets, right?
Not just solr solr
s...@abc.com as you show, but the and escaped, right? Otherwise,
those email addresses become part of XML markup and mess it all up
2) Your binary content is encoded in some way inside XML, right? Not just
random binary, which would make it invalid XML? Like base64 or something?
3) I suspect you will need to use UpdateRequestProcessor one way or
another. To decode base64 as first step and to feed it through whatever you
want to process actually binary with as a second step. So, it might be a
custom URP, with similar functionality to ExtractingRequestHandler with the
difference that you already have a document object and you are mapping one
- binary - field in it into a bunch of other fields with some conventions
on names, overrides, etc.

Regards,
Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)

On Mon, Dec 9, 2013 at 5:55 PM, neerajp neeraj_star2...@yahoo.com wrote:

Hi,
I am using Solr for searching my email data. My application is in C++ so I
a
using CURL library to POST the data to Solr for indexing. I am posting data
in XML format and some of the XML fields are in plain text and some of the
fields are in binary format. I want to know what should I do so that Solr
can index both types of data (plain text as well as binary data) coming in
a
single XML file.

For the reference my XML file looks like:
adddocfield name=mailbox-id/fieldfield
name=folderINBOX/fieldfield name=fromsolr solr
s...@abc.com/fieldfield name=tosolr s...@abc.com/fieldfield
name=email-bodyHI I AM EMAIL BODY\r\n\r\nTHANKS/fieldfield
name=email-attachmentSome binary data/doc/add

I tried to use ExtractingUpdateProcessorFactory but it seems to me that
ExtractingUpdateProcessorFactory support is not in Solr 4.5(which I am
using) even not in any of the Solr version available in market.

Also, I think I can not use ExtractingRequestHandler for my problem as the
document is of type XML format and having mixed type of data(text and
binary). Am I right ?? If yes, pls. suggest me how to proceed and if no,
how
can I extract text using ExtractingRequestHandler from some of the binary
fields.

Any help is highly appreciated.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661.html
Sent from the Solr - User mailing list archive at Nabble.com.

Searching for document by id in a sharded environment

2013-12-09 Thread Daniel Bryant


Hi,

I'm in the process of migrating an application that queries Solr to use 
a new sharded SolrCloud, and as part of this I'm adding the shard key to 
the document id when we index documents (as we're using grouping and we 
need to ensure that grouped documents end up on the same shard) e.g.


156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475

I'm having a problem with my application when searching by id with SolrJ 
CloudSolrServer - the exclamation point is misinterpreted as a boolean 
negation, and the matching document is not returned in the search results.


I just wanted to check if the only way to make this work would be to 
escape the exclamation point (i.e. prefix with a slash, or enclose the 
id within quotes). We're keen to avoid this, as this will require lots 
of modifications throughout the code on a series of applications that 
interact with Solr.


If anyone has any better suggestions on how to achieve this it would be 
very much appreciated!


Best wishes,

Daniel


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
http://www.tai-dev.co.uk/*
daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk https://twitter.com/taidevcouk

Re: Searching for document by id in a sharded environment

Hi Daniel,

TermQueryParser comes handy when you don't want to escape.

q = {!term 
f=id}156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475




On Monday, December 9, 2013 2:14 PM, Daniel Bryant 
daniel.bry...@tai-dev.co.uk wrote:
Hi,

I'm in the process of migrating an application that queries Solr to use 
a new sharded SolrCloud, and as part of this I'm adding the shard key to 
the document id when we index documents (as we're using grouping and we 
need to ensure that grouped documents end up on the same shard) e.g.

156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475

I'm having a problem with my application when searching by id with SolrJ 
CloudSolrServer - the exclamation point is misinterpreted as a boolean 
negation, and the matching document is not returned in the search results.

I just wanted to check if the only way to make this work would be to 
escape the exclamation point (i.e. prefix with a slash, or enclose the 
id within quotes). We're keen to avoid this, as this will require lots 
of modifications throughout the code on a series of applications that 
interact with Solr.

If anyone has any better suggestions on how to achieve this it would be 
very much appreciated!

Best wishes,

Daniel


-- 
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
http://www.tai-dev.co.uk/*
daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk https://twitter.com/taidevcouk

Re: Resolve an issuse with SOLR

2013-12-09 Thread Varun Thacker

Hi Munusamy,

A typical core directory contains a conf/ folder and a data/ folder
The conf directory should contain the solrconfig.xml and schema.xml

You should have a folder with the same name as the instanceDir parameter
on the admin UI. Inside this folder the conf/ and data/ directories should
exist.

So you first need to have the directory present and with the solrconfig and
schema files and then go to the core admin page and create a core.



On Mon, Dec 9, 2013 at 12:45 PM, Munusamy, Kannan 
kannan.munus...@capgemini.com wrote:



 Hi,



 I have used the +add core option in the admin UI. But I can’t able to add
 a core. After then, it showed the *“hTTP Status 500 - {msg=SolrCore
 'new_core' is not available due to init failure: Path must not end with /”
 … * Once I restarted the solr service, now I am getting this error in the
 UI –



 *“Unable to load environment info from
 /solr/collection1_shard1_replica1/admin/system?wt=json.*

 *This interface requires that you activate the admin request handlers in
 all SolrCores by adding the following configuration to your
 solrconfig.xml:”*



 PFA error image.



 Please provide suggestions and help us to resolve the issue.



 Thanks  Regards,

 



 *   Kannan Munusamy *|  *♠* *Cap**gemini** India*  | Bangalore


 É  Off: 080 66567000 Extn: 8068605  I  Cell: + 91 9952312352

 [image: cid:image001.gif@01CCD07C.65A424C0]4
 kannan.munus...@capgemini.com I www.in.capgemini.com

  *People matter, results count.*

 

P Print only if absolutely necessary | *7* Switch off as you go |*q*Recycle
 always



 This message contains information that may be privileged or confidential
 and is the property of the Capgemini Group. It is intended only for the
 person to whom it is addressed. If you are not the intended recipient, you
 are not authorized to read, print, retain, copy, disseminate, distribute,
 or use this message or any part thereof. If you receive this message in
 error, please notify the sender immediately and delete all copies of this
 message.




-- 


Regards,
Varun Thacker
http://www.vthacker.in/

Is it possible to retain the indexed data from solr

2013-12-09 Thread roopasingh250

I am implementing solr search in my application.I am indexing the data from
mysql server to xml file,using the version solr 1.4.My question is 
 
  1.Is it possible to retain the indexed xml data into a csv or pdf file.

  2.Is it possible to save the data from indexed xml to mysql server.For
example,if i am indexing a xml file manually,not from mysql server is any
chance to save the indexed data to mysql server.

Thanks





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-retain-the-indexed-data-from-solr-tp4105682.html
Sent from the Solr - User mailing list archive at Nabble.com.

Multi part field

2013-12-09 Thread steven crichton

I am trying to implement a ranged field type in a booking system.

The price structure is variable between 2 dates (determined by the property
owner)
So it looks like this 

Date A - Date B = Price Value

I've been looking through a lot of docs, but so far have not been able to
find how I could possibly implement such an object within SOLR.

the only thing I have so fa thought of is have 2 fields

-DATE PRICE RANGE
- PRICE RANGE VAL

then get the index of the DATE PRICE RANGE array object that matches and
apply that to the PRICE RANGE VAL to get the value.


Any help would be very appreciated on this as it's the make or break of the
new search system for our site just now.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-part-field-tp4105685.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi part field - EXAMPLE DATA

2013-12-09 Thread steven crichton

prices: [{start-date: 05-01-2013,end-date: 02-03-2013,price:
760},{start-date: 02-03-2013,end-date: 06-04-2013,price:
800},{start-date: 06-04-2013,end-date: 01-06-2013,price:
1028},{start-date: 01-06-2013,end-date: 29-06-2013,price:
1240},{start-date: 29-06-2013,end-date: 06-07-2013,price:
1340},{start-date: 06-07-2013,end-date: 10-08-2013,price:
1678},{start-date: 10-08-2013,end-date: 24-08-2013,price:
1578},{start-date: 24-08-2013,end-date: 31-08-2013,price:
1340},{start-date: 31-08-2013,end-date: 21-09-2013,price:
1240},{start-date: 21-09-2013,end-date: 19-10-2013,price:
1028},{start-date: 19-10-2013,end-date: 02-11-2013,price:
800},{start-date: 02-11-2013,end-date: 11-01-2014,price: 760}],



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-part-field-tp4105685p4105686.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

2013-12-09 Thread Salman Akram

Yup on debugging I found that its coming in Analyzer. We are using Standard
Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its
a bug or I am missing some config.


On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Salman,
 I am confused because with surround no analysis is applied at query time.
 I suspect that surround query parser is not kicking in. You should see
 SrndQuery or something like at parser query section.



 On Monday, December 9, 2013 6:24 AM, Salman Akram 
 salman.ak...@northbaysolutions.net wrote:

 All,

 I posted this sub-issue with another issue few days back but maybe it was
 not obvious so posting it on a separate thread.

 We recently migrated to SOLR 4.6. We use Common Grams but queries with
 words in the CG list have slowed down. On debugging we found that for CG
 words the parser is adding individual tokens of those words in the query
 too which ends up slowing it. Below is an example:

 Query = only be

 Here is what debug shows. I have highlighted the red part which is
 different in both versions i.e. SOLR 4.6 is making it a multiphrasequery
 and adding individual tokens too. Can someone help?

 SOLR 4.6 (takes 20 secs)
 str name=rawquerystring{!surround}Contents:only be/str
 str name=querystring{!surround}Contents:only be/str
 str name=parsedqueryMultiPhraseQuery(Contents:(only only_be)
 be)/str
 str name=parsedquery_toStringContents:(only only_be) be/str

 SOLR 1.4.1 (takes 1 sec)
 str name=rawquerystring{!surround}Contents:only be/str
 str name=querystring{!surround}Contents:only be/str
 str name=parsedqueryContents:only_be/str
 str name=parsedquery_toStringContents:only_be/str--


 Regards,

 Salman Akram




-- 
Regards,

Salman Akram

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

2013-12-09 Thread Erik Hatcher

But again, as Ahmet mentioned… it doesn't look like the surround query parser 
is actually being used.   The debug output also mentioned the query parser 
used, but that part wasn't provided below.  One thing to note here, the 
surround query parser is not available in 1.4.1.   It also looks like you're 
surrounding your query with angle brackets, as it says query string is 
{!surround}Contents:only be, which is not correct syntax.  And one of the 
most important things to note here is that the surround query parser does NOT 
use the analysis chain of the field, see 
http://wiki.apache.org/solr/SurroundQueryParser#Limitations.  In short, 
you're going to have to do some work to get common grams factored into a 
surround query (such as maybe calling to the analysis request hander to parse 
the query before sending it to the surround query parser).

Erik


On Dec 9, 2013, at 9:36 AM, Salman Akram salman.ak...@northbaysolutions.net 
wrote:

 Yup on debugging I found that its coming in Analyzer. We are using Standard
 Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its
 a bug or I am missing some config.
 
 
 On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
 Hi Salman,
 I am confused because with surround no analysis is applied at query time.
 I suspect that surround query parser is not kicking in. You should see
 SrndQuery or something like at parser query section.
 
 
 
 On Monday, December 9, 2013 6:24 AM, Salman Akram 
 salman.ak...@northbaysolutions.net wrote:
 
 All,
 
 I posted this sub-issue with another issue few days back but maybe it was
 not obvious so posting it on a separate thread.
 
 We recently migrated to SOLR 4.6. We use Common Grams but queries with
 words in the CG list have slowed down. On debugging we found that for CG
 words the parser is adding individual tokens of those words in the query
 too which ends up slowing it. Below is an example:
 
 Query = only be
 
 Here is what debug shows. I have highlighted the red part which is
 different in both versions i.e. SOLR 4.6 is making it a multiphrasequery
 and adding individual tokens too. Can someone help?
 
 SOLR 4.6 (takes 20 secs)
 str name=rawquerystring{!surround}Contents:only be/str
 str name=querystring{!surround}Contents:only be/str
 str name=parsedqueryMultiPhraseQuery(Contents:(only only_be)
 be)/str
 str name=parsedquery_toStringContents:(only only_be) be/str
 
 SOLR 1.4.1 (takes 1 sec)
 str name=rawquerystring{!surround}Contents:only be/str
 str name=querystring{!surround}Contents:only be/str
 str name=parsedqueryContents:only_be/str
 str name=parsedquery_toStringContents:only_be/str--
 
 
 Regards,
 
 Salman Akram
 
 
 
 
 -- 
 Regards,
 
 Salman Akram

Getting Solr Document Attributes from a Custom Function

2013-12-09 Thread Mukundaraman valakumaresan

Hi All,

I have a written a custom solr function and I would like to read a property
of the document inside my custom function. Is it possible to get that using
Solr?

For eg. inside the floatVal method, I would like to get the value of the
attribute name

public class CustomValueSource extends ValueSource {

@Override
public FunctionValues getValues(Map context,
AtomicReaderContext readerContext) throws IOException {
 return new FloatDocValues(this) { @Override public float floatVal(int doc)
{
/***
 getDocument(doc).getAttribute(name)

/ }}}

Thanks  Regards
Mukund

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

I have a new question about this issue - I create a filter queries of
the form:

fq=start_time:[* TO NOW/5MINUTE]

This is used to restrict the set of documents to only items that have a
start time within the next 5 minutes. Most of my indexes have millions
of documents with few documents that start sometime in the future.
Nearly all of my queries include this, would this cause every other
search thread to block until the filter query is re-cached every 5
minutes and if so, is there a better way to do it? Thanks for any
continued help with this issue!

 We have a webapp running with a very high HEAP size (24GB) and we have
 no problems with it AFTER we enabled the new GC that is meant to replace
 sometime in the future the CMS GC, but you have to have Java 6 update
 Some number I couldn't find but latest should cover to be able to use:
 
 1. Remove all GC options you have and...
 2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/
 
 As a test of course, more information you can read on the following (and
 interesting) article, we also have Solr running with these options, no
 more pauses or HEAP size hitting the sky.
 
 Don't get bored reading the 1st (and small) introduction page of the
 article, page 2 and 3 will make lot of sense:
 http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
 
 
 HTH,
 
 Guido.
 
 On 26/11/13 21:59, Patrick O'Lone wrote:
 We do perform a lot of sorting - on multiple fields in fact. We have
 different kinds of Solr configurations - our news searches do little
 with regards to faceting, but heavily sort. We provide classified ad
 searches and that heavily uses faceting. I might try reducing the JVM
 memory some and amount of perm generation as suggested earlier. It feels
 like a GC issue and loading the cache just happens to be the victim of a
 stop-the-world event at the worse possible time.

 My gut instinct is that your heap size is way too high. Try
 decreasing it to like 5-10G. I know you say it uses more than that,
 but that just seems bizarre unless you're doing something like
 faceting and/or sorting on every field.

 -Michael

 -Original Message-
 From: Patrick O'Lone [mailto:pol...@townnews.com]
 Sent: Tuesday, November 26, 2013 11:59 AM
 To: solr-user@lucene.apache.org
 Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache

 I've been tracking a problem in our Solr environment for awhile with
 periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
 try and thought I might get some insight from some others on this list.

 The load on the server is normally anywhere between 1-3. It's an
 8-core machine with 40GB of RAM. I have about 25GB of index data that
 is replicated to this server every 5 minutes. It's taking about 200
 connections per second and roughly every 5-10 minutes it will stall
 for about 30 seconds to a minute. The stall causes the load to go to
 as high as 90. It is all CPU bound in user space - all cores go to
 99% utilization (spinlock?). When doing a thread dump, the following
 line is blocked in all running Tomcat threads:

 org.apache.lucene.search.FieldCacheImpl$Cache.get (
 FieldCacheImpl.java:230 )

 Looking the source code in 3.6.1, that is a function call to
 syncronized() which blocks all threads and causes the backlog. I've
 tried to correlate these events to the replication events - but even
 with replication disabled - this still happens. We run multiple data
 centers using Solr and I was comparing garbage collection processes
 between and noted that the old generation is collected very
 differently on this data center versus others. The old generation is
 collected as a massive collect event (several gigabytes worth) - the
 other data center is more saw toothed and collects only in 500MB-1GB
 at a time. Here's my parameters to java (the same in all environments):

 /usr/java/jre/bin/java \
 -verbose:gc \
 -XX:+PrintGCDetails \
 -server \
 -Dcom.sun.management.jmxremote \
 -XX:+UseConcMarkSweepGC \
 -XX:+UseParNewGC \
 -XX:+CMSIncrementalMode \
 -XX:+CMSParallelRemarkEnabled \
 -XX:+CMSIncrementalPacing \
 -XX:NewRatio=3 \
 -Xms30720M \
 -Xmx30720M \
 -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \
 -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \
 -Dcatalina.base=/usr/local/share/apache-tomcat \
 -Dcatalina.home=/usr/local/share/apache-tomcat \
 -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start

 I've tried a few GC option changes from this (been running this way
 for a couple of years now) - primarily removing CMS Incremental mode
 as we have 8 cores and remarks on the internet suggest that it is
 only for smaller SMP setups. Removing CMS did not fix anything.

 I've considered that the heap is way too large (30GB from 40GB) and
 may not leave enough memory for mmap operations (MMap appears to be
 used in the field cache). Based on active memory utilization in Java,
 seems like I might be able to reduce down to 22GB safely

Re: [Solr Wiki] Your wiki account data

2013-12-09 Thread Mehdi Burgy

Hello,

Is this email address still valid?

Kind Regards


2013/12/4 Mehdi Burgy gla...@gmail.com

 Hello,

 We've recently launched a job search engine using Solr, and would like to
 add it here: https://wiki.apache.org/solr/PublicServers

 Would it be possible to allow me be part of the publishing group?

 Thank you for your help

 Kind Regards,

 Mehdi Burgy
  New Job Search Engine:
 www.jobreez.com

 -- Forwarded message --
 From: Apache Wiki wikidi...@apache.org
 Date: 2013/12/4
 Subject: [Solr Wiki] Your wiki account data
 To: Apache Wiki wikidi...@apache.org



 Somebody has requested to email you a password recovery token.

 If you lost your password, please go to the password reset URL below or
 go to the password recovery page again and enter your username and the
 recovery token.

 Login Name: madeinch

SolrCloud 4.6.0 - leader election issue

2013-12-09 Thread Elodie Sannier


Hello,

I am using SolrCloud 4.6.0 with two shards, two replicas by shard and with two 
collections.

collection fr_blue:
- shard1 - server-01 (replica1), server-01 (replica2)
- shard2 - server-02 (replica1), server-02 (replica2)

collection fr_green:
- shard1 - server-01 (replica1), server-01 (replica2)
- shard2 - server-02 (replica1), server-02 (replica2)

If I start the four solr instances without a delay between each start, it is 
not possible to connect to them and it is not possible to acces to the Solr 
Admin page.

If I get the clusterstate.json with zkCli, the statuses are:
- active for the leaders of the first collection
- recoveryingfor the other replicas of the first collection
- down for all replicas of the second collection (no leader)

The logs loop on the following messages :

server-01:
2013-12-09 14:41:28,634 
[main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG 
org.apache.zookeeper.ClientCnxn:readResponse:815  - Reading reply 
sessionid:0x142d770be010063, packet:: clientPath:null serverPath:null 
finished:false header:: 568,4  replyHeader:: 568,483813,-101  request:: 
'/s6fr/collections/fr_green/leaders/shard1,F  response::
2013-12-09 14:41:28,635 
[main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG 
org.apache.zookeeper.ClientCnxn:readResponse:815  - Reading reply 
sessionid:0x142d770be010064, packet:: clientPath:null serverPath:null 
finished:false header:: 372,4  replyHeader:: 372,483813,-101  request:: 
'/s6fr/collections/fr_green/leaders/shard2,F  response::

server-02:
2013-12-09 14:41:51,381 
[main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG 
org.apache.zookeeper.ClientCnxn:readResponse:815  - Reading reply 
sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null 
finished:false header:: 1014,4  replyHeader:: 1014,483813,0  request:: 
'/s6fr/overseer_elect/leader,F  response:: 
#7b226964223a2239303837313832313732343837363839342d6463312d76742d6465762d78656e2d30362d766d2d30362e6465762e6463312e6b656c6b6f6f2e6e65743a383038375f736561726368736f6c726e6f646566722d6e5f30303030303030303634227d,s{483632,483632,1386599789203,1386599789203,0,0,0,90871821724876894,104,0,483632}
2013-12-09 14:41:51,383 
[main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG 
org.apache.zookeeper.ClientCnxn:readResponse:815  - Reading reply 
sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null 
finished:false header:: 1015,8  replyHeader:: 1015,483813,0  request:: 
'/s6fr/overseer/queue,F  response:: v{}
2013-12-09 14:41:51,385 
[main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG 
org.apache.zookeeper.ClientCnxn:readResponse:815  - Reading reply 
sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null 
finished:false header:: 1016,8  replyHeader:: 1016,483813,0  request:: 
'/s6fr/overseer/queue-work,F  response:: v{}

After 10 minutes, there is a WARN message, a leader is found for the second 
collection and it is possible to connect to the solr instances:

2013-12-06 21:17:57,635 [main-EventThread] INFO  
org.apache.solr.common.cloud.ZkStateReader:process:212  - A cluster state 
change: WatchedEvent state:SyncConnected type:NodeDataChanged 
path:/clusterstate.json, has occurred - updating... (live nodes size: 4)
2013-12-06 21:27:58,719 [coreLoadExecutor-4-thread-2] WARN  
org.apache.solr.update.PeerSync:handleResponse:322  - PeerSync: core=fr_green 
url=http://dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr  
exception talking to 
http://dc1-vt-dev-xen-06-vm-06.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/,
 failed
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: incref on 
a closed log: 
tlog{file=/opt/kookel/data/searchSolrNode/solrindex/fr1_green/tlog/tlog.001
 refcount=1}
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:156)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
2013-12-06 21:27:58,730 [coreLoadExecutor-4-thread-2] INFO  
org.apache.solr.cloud.SyncStrategy:syncReplicas:134  - Leader's attempt to sync 
with shard failed,

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Guido Medina

I was trying to locate the release notes for 3.6.x it is too old, if I 
were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you 
since it is a minor release, locate the release notes and see if 
something that is affecting you got fixed, also, I would be thinking on 
moving on to 4.x which is quite stable and fast.


Like anything with Java and concurrency, it will just get better (and 
faster) with bigger numbers and concurrency frameworks becoming more and 
more reliable, standard and stable.


Regards,

Guido.

On 09/12/13 15:07, Patrick O'Lone wrote:

I have a new question about this issue - I create a filter queries of
the form:

fq=start_time:[* TO NOW/5MINUTE]

This is used to restrict the set of documents to only items that have a
start time within the next 5 minutes. Most of my indexes have millions
of documents with few documents that start sometime in the future.
Nearly all of my queries include this, would this cause every other
search thread to block until the filter query is re-cached every 5
minutes and if so, is there a better way to do it? Thanks for any
continued help with this issue!


We have a webapp running with a very high HEAP size (24GB) and we have
no problems with it AFTER we enabled the new GC that is meant to replace
sometime in the future the CMS GC, but you have to have Java 6 update
Some number I couldn't find but latest should cover to be able to use:

1. Remove all GC options you have and...
2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/

As a test of course, more information you can read on the following (and
interesting) article, we also have Solr running with these options, no
more pauses or HEAP size hitting the sky.

Don't get bored reading the 1st (and small) introduction page of the
article, page 2 and 3 will make lot of sense:
http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061


HTH,

Guido.

On 26/11/13 21:59, Patrick O'Lone wrote:

We do perform a lot of sorting - on multiple fields in fact. We have
different kinds of Solr configurations - our news searches do little
with regards to faceting, but heavily sort. We provide classified ad
searches and that heavily uses faceting. I might try reducing the JVM
memory some and amount of perm generation as suggested earlier. It feels
like a GC issue and loading the cache just happens to be the victim of a
stop-the-world event at the worse possible time.


My gut instinct is that your heap size is way too high. Try
decreasing it to like 5-10G. I know you say it uses more than that,
but that just seems bizarre unless you're doing something like
faceting and/or sorting on every field.

-Michael

-Original Message-
From: Patrick O'Lone [mailto:pol...@townnews.com]
Sent: Tuesday, November 26, 2013 11:59 AM
To: solr-user@lucene.apache.org
Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache

I've been tracking a problem in our Solr environment for awhile with
periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
try and thought I might get some insight from some others on this list.

The load on the server is normally anywhere between 1-3. It's an
8-core machine with 40GB of RAM. I have about 25GB of index data that
is replicated to this server every 5 minutes. It's taking about 200
connections per second and roughly every 5-10 minutes it will stall
for about 30 seconds to a minute. The stall causes the load to go to
as high as 90. It is all CPU bound in user space - all cores go to
99% utilization (spinlock?). When doing a thread dump, the following
line is blocked in all running Tomcat threads:

org.apache.lucene.search.FieldCacheImpl$Cache.get (
FieldCacheImpl.java:230 )

Looking the source code in 3.6.1, that is a function call to
syncronized() which blocks all threads and causes the backlog. I've
tried to correlate these events to the replication events - but even
with replication disabled - this still happens. We run multiple data
centers using Solr and I was comparing garbage collection processes
between and noted that the old generation is collected very
differently on this data center versus others. The old generation is
collected as a massive collect event (several gigabytes worth) - the
other data center is more saw toothed and collects only in 500MB-1GB
at a time. Here's my parameters to java (the same in all environments):

/usr/java/jre/bin/java \
-verbose:gc \
-XX:+PrintGCDetails \
-server \
-Dcom.sun.management.jmxremote \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+CMSIncrementalMode \
-XX:+CMSParallelRemarkEnabled \
-XX:+CMSIncrementalPacing \
-XX:NewRatio=3 \
-Xms30720M \
-Xmx30720M \
-Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \
-classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \
-Dcatalina.base=/usr/local/share/apache-tomcat \
-Dcatalina.home=/usr/local/share/apache-tomcat \
-Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start

I've tried a few GC option changes

Re: Indexing on plain text and binary data in a single HTTP POST request

Hi Alexandre,
Thanks very much for responding my post. Pls. find my response in-line:

1) For your email address fields, you are escaping the brackets, right?
Not just solr solr
[hidden email] as you show, but the and escaped, right? Otherwise,
those email addresses become part of XML markup and mess it all up

[Neraj]: Yes, you are right. I used CDATA for escaping and or any
special characters in XML

2) Your binary content is encoded in some way inside XML, right? Not just
random binary, which would make it invalid XML? Like base64 or something?

[Neeraj]: I want to use random binary(*not base64 encoded*) in some of the
XML fields inside CDATA tag so that XML will not become invalid. I hope I
can do this.

3) To decode base64 as first step and to feed it through whatever you want
to process actually
binary with as a second step. So, it might be a custom URP, with similar
functionality to ExtractingRequestHandler with the difference that you
already have a document object and you are mapping one - binary - field in
it into a bunch of other fields with some conventions
on names, overrides, etc.

[Neeraj]: Now, My XML document is containing some of the fields in plain
text and some of the fields in random binary format.

I tried to use ExtractingUpdateProcessor but soon came to know that the same
is not rolled out in solr 4.5
I am not sure how to use ExtractingRequestHandler for an XML document having
some of the fields in plain text and some of the fields in random binary
format. It seems to me that ExtractingRequestHandler is used to extract text
from a binary file input but my input document is in XML format not binary.

I am new to Solr so need your valuable suggestion.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661p4105706.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: JVM crashed when start solr

2013-12-09 Thread michael.boom

Which are you solr startup parameters (java options) ?
You can assign more memory to the JVM by specifying -Xmx=10G or whichever
value works for you.



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/JVM-crashed-when-start-solr-tp4105702p4105705.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SolrCloud 4.6.0 - leader election issue

2013-12-09 Thread Markus Jelsma

I can confirm i've seen this issue as well on trunk, a very recent build.  

-Original message-
 From:Elodie Sannier elodie.sann...@kelkoo.fr
 Sent: Monday 9th December 2013 16:43
 To: solr-user@lucene.apache.org
 Cc: search5t...@lists.kelkoo.com
 Subject: SolrCloud 4.6.0 - leader election issue

 Hello,

 I am using SolrCloud 4.6.0 with two shards, two replicas by shard and with 
 two collections.

 collection fr_blue:
 - shard1 - server-01 (replica1), server-01 (replica2)
 - shard2 - server-02 (replica1), server-02 (replica2)

 collection fr_green:
 - shard1 - server-01 (replica1), server-01 (replica2)
 - shard2 - server-02 (replica1), server-02 (replica2)

 If I start the four solr instances without a delay between each start, it is 
 not possible to connect to them and it is not possible to acces to the Solr 
 Admin page.

 If I get the clusterstate.json with zkCli, the statuses are:
 - active for the leaders of the first collection
 - recoveryingfor the other replicas of the first collection
 - down for all replicas of the second collection (no leader)

 The logs loop on the following messages :

 server-01:
 2013-12-09 14:41:28,634 
 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG 
 org.apache.zookeeper.ClientCnxn:readResponse:815  - Reading reply 
 sessionid:0x142d770be010063, packet:: clientPath:null serverPath:null 
 finished:false header:: 568,4  replyHeader:: 568,483813,-101  request:: 
 '/s6fr/collections/fr_green/leaders/shard1,F  response::
 2013-12-09 14:41:28,635 
 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG 
 org.apache.zookeeper.ClientCnxn:readResponse:815  - Reading reply 
 sessionid:0x142d770be010064, packet:: clientPath:null serverPath:null 
 finished:false header:: 372,4  replyHeader:: 372,483813,-101  request:: 
 '/s6fr/collections/fr_green/leaders/shard2,F  response::

 server-02:
 2013-12-09 14:41:51,381 
 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG 
 org.apache.zookeeper.ClientCnxn:readResponse:815  - Reading reply 
 sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null 
 finished:false header:: 1014,4  replyHeader:: 1014,483813,0  request:: 
 '/s6fr/overseer_elect/leader,F  response:: 
 #7b226964223a2239303837313832313732343837363839342d6463312d76742d6465762d78656e2d30362d766d2d30362e6465762e6463312e6b656c6b6f6f2e6e65743a383038375f736561726368736f6c726e6f646566722d6e5f30303030303030303634227d,s{483632,483632,1386599789203,1386599789203,0,0,0,90871821724876894,104,0,483632}
 2013-12-09 14:41:51,383 
 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG 
 org.apache.zookeeper.ClientCnxn:readResponse:815  - Reading reply 
 sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null 
 finished:false header:: 1015,8  replyHeader:: 1015,483813,0  request:: 
 '/s6fr/overseer/queue,F  response:: v{}
 2013-12-09 14:41:51,385 
 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG 
 org.apache.zookeeper.ClientCnxn:readResponse:815  - Reading reply 
 sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null 
 finished:false header:: 1016,8  replyHeader:: 1016,483813,0  request:: 
 '/s6fr/overseer/queue-work,F  response:: v{}

 After 10 minutes, there is a WARN message, a leader is found for the second 
 collection and it is possible to connect to the solr instances:

 2013-12-06 21:17:57,635 [main-EventThread] INFO  
 org.apache.solr.common.cloud.ZkStateReader:process:212  - A cluster state 
 change: WatchedEvent state:SyncConnected type:NodeDataChanged 
 path:/clusterstate.json, has occurred - updating... (live nodes size: 4)
 2013-12-06 21:27:58,719 [coreLoadExecutor-4-thread-2] WARN  
 org.apache.solr.update.PeerSync:handleResponse:322  - PeerSync: core=fr_green 
 url=http://dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr  
 exception talking to 
 http://dc1-vt-dev-xen-06-vm-06.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/,
  failed
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: incref 
 on a closed log: 
 tlog{file=/opt/kookel/data/searchSolrNode/solrindex/fr1_green/tlog/tlog.001
  refcount=1}
  at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491)
  at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
  at 
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:156)
  at 
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

Unfortunately, in a test environment, this happens in version 4.4.0 of
Solr as well.

 I was trying to locate the release notes for 3.6.x it is too old, if I
 were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you
 since it is a minor release, locate the release notes and see if
 something that is affecting you got fixed, also, I would be thinking on
 moving on to 4.x which is quite stable and fast.
 
 Like anything with Java and concurrency, it will just get better (and
 faster) with bigger numbers and concurrency frameworks becoming more and
 more reliable, standard and stable.
 
 Regards,
 
 Guido.
 
 On 09/12/13 15:07, Patrick O'Lone wrote:
 I have a new question about this issue - I create a filter queries of
 the form:

 fq=start_time:[* TO NOW/5MINUTE]

 This is used to restrict the set of documents to only items that have a
 start time within the next 5 minutes. Most of my indexes have millions
 of documents with few documents that start sometime in the future.
 Nearly all of my queries include this, would this cause every other
 search thread to block until the filter query is re-cached every 5
 minutes and if so, is there a better way to do it? Thanks for any
 continued help with this issue!

 We have a webapp running with a very high HEAP size (24GB) and we have
 no problems with it AFTER we enabled the new GC that is meant to replace
 sometime in the future the CMS GC, but you have to have Java 6 update
 Some number I couldn't find but latest should cover to be able to use:

 1. Remove all GC options you have and...
 2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/

 As a test of course, more information you can read on the following (and
 interesting) article, we also have Solr running with these options, no
 more pauses or HEAP size hitting the sky.

 Don't get bored reading the 1st (and small) introduction page of the
 article, page 2 and 3 will make lot of sense:
 http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061



 HTH,

 Guido.

 On 26/11/13 21:59, Patrick O'Lone wrote:
 We do perform a lot of sorting - on multiple fields in fact. We have
 different kinds of Solr configurations - our news searches do little
 with regards to faceting, but heavily sort. We provide classified ad
 searches and that heavily uses faceting. I might try reducing the JVM
 memory some and amount of perm generation as suggested earlier. It
 feels
 like a GC issue and loading the cache just happens to be the victim
 of a
 stop-the-world event at the worse possible time.

 My gut instinct is that your heap size is way too high. Try
 decreasing it to like 5-10G. I know you say it uses more than that,
 but that just seems bizarre unless you're doing something like
 faceting and/or sorting on every field.

 -Michael

 -Original Message-
 From: Patrick O'Lone [mailto:pol...@townnews.com]
 Sent: Tuesday, November 26, 2013 11:59 AM
 To: solr-user@lucene.apache.org
 Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache

 I've been tracking a problem in our Solr environment for awhile with
 periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
 try and thought I might get some insight from some others on this
 list.

 The load on the server is normally anywhere between 1-3. It's an
 8-core machine with 40GB of RAM. I have about 25GB of index data that
 is replicated to this server every 5 minutes. It's taking about 200
 connections per second and roughly every 5-10 minutes it will stall
 for about 30 seconds to a minute. The stall causes the load to go to
 as high as 90. It is all CPU bound in user space - all cores go to
 99% utilization (spinlock?). When doing a thread dump, the following
 line is blocked in all running Tomcat threads:

 org.apache.lucene.search.FieldCacheImpl$Cache.get (
 FieldCacheImpl.java:230 )

 Looking the source code in 3.6.1, that is a function call to
 syncronized() which blocks all threads and causes the backlog. I've
 tried to correlate these events to the replication events - but even
 with replication disabled - this still happens. We run multiple data
 centers using Solr and I was comparing garbage collection processes
 between and noted that the old generation is collected very
 differently on this data center versus others. The old generation is
 collected as a massive collect event (several gigabytes worth) - the
 other data center is more saw toothed and collects only in 500MB-1GB
 at a time. Here's my parameters to java (the same in all
 environments):

 /usr/java/jre/bin/java \
 -verbose:gc \
 -XX:+PrintGCDetails \
 -server \
 -Dcom.sun.management.jmxremote \
 -XX:+UseConcMarkSweepGC \
 -XX:+UseParNewGC \
 -XX:+CMSIncrementalMode \
 -XX:+CMSParallelRemarkEnabled \
 -XX:+CMSIncrementalPacing \
 -XX:NewRatio=3 \
 -Xms30720M \
 -Xmx30720M \
 -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \
 -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread fbrisbart

If you want a start time within the next 5 minutes, I think your filter
is not the good one.
* will be replaced by the first date in your field

Try :
fq=start_time:[NOW TO NOW+5MINUTE]

Franck Brisbart


Le lundi 09 décembre 2013 à 09:07 -0600, Patrick O'Lone a écrit :
 I have a new question about this issue - I create a filter queries of
 the form:
 
 fq=start_time:[* TO NOW/5MINUTE]
 
 This is used to restrict the set of documents to only items that have a
 start time within the next 5 minutes. Most of my indexes have millions
 of documents with few documents that start sometime in the future.
 Nearly all of my queries include this, would this cause every other
 search thread to block until the filter query is re-cached every 5
 minutes and if so, is there a better way to do it? Thanks for any
 continued help with this issue!
 
  We have a webapp running with a very high HEAP size (24GB) and we have
  no problems with it AFTER we enabled the new GC that is meant to replace
  sometime in the future the CMS GC, but you have to have Java 6 update
  Some number I couldn't find but latest should cover to be able to use:
  
  1. Remove all GC options you have and...
  2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/
  
  As a test of course, more information you can read on the following (and
  interesting) article, we also have Solr running with these options, no
  more pauses or HEAP size hitting the sky.
  
  Don't get bored reading the 1st (and small) introduction page of the
  article, page 2 and 3 will make lot of sense:
  http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
  
  
  HTH,
  
  Guido.
  
  On 26/11/13 21:59, Patrick O'Lone wrote:
  We do perform a lot of sorting - on multiple fields in fact. We have
  different kinds of Solr configurations - our news searches do little
  with regards to faceting, but heavily sort. We provide classified ad
  searches and that heavily uses faceting. I might try reducing the JVM
  memory some and amount of perm generation as suggested earlier. It feels
  like a GC issue and loading the cache just happens to be the victim of a
  stop-the-world event at the worse possible time.
 
  My gut instinct is that your heap size is way too high. Try
  decreasing it to like 5-10G. I know you say it uses more than that,
  but that just seems bizarre unless you're doing something like
  faceting and/or sorting on every field.
 
  -Michael
 
  -Original Message-
  From: Patrick O'Lone [mailto:pol...@townnews.com]
  Sent: Tuesday, November 26, 2013 11:59 AM
  To: solr-user@lucene.apache.org
  Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache
 
  I've been tracking a problem in our Solr environment for awhile with
  periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
  try and thought I might get some insight from some others on this list.
 
  The load on the server is normally anywhere between 1-3. It's an
  8-core machine with 40GB of RAM. I have about 25GB of index data that
  is replicated to this server every 5 minutes. It's taking about 200
  connections per second and roughly every 5-10 minutes it will stall
  for about 30 seconds to a minute. The stall causes the load to go to
  as high as 90. It is all CPU bound in user space - all cores go to
  99% utilization (spinlock?). When doing a thread dump, the following
  line is blocked in all running Tomcat threads:
 
  org.apache.lucene.search.FieldCacheImpl$Cache.get (
  FieldCacheImpl.java:230 )
 
  Looking the source code in 3.6.1, that is a function call to
  syncronized() which blocks all threads and causes the backlog. I've
  tried to correlate these events to the replication events - but even
  with replication disabled - this still happens. We run multiple data
  centers using Solr and I was comparing garbage collection processes
  between and noted that the old generation is collected very
  differently on this data center versus others. The old generation is
  collected as a massive collect event (several gigabytes worth) - the
  other data center is more saw toothed and collects only in 500MB-1GB
  at a time. Here's my parameters to java (the same in all environments):
 
  /usr/java/jre/bin/java \
  -verbose:gc \
  -XX:+PrintGCDetails \
  -server \
  -Dcom.sun.management.jmxremote \
  -XX:+UseConcMarkSweepGC \
  -XX:+UseParNewGC \
  -XX:+CMSIncrementalMode \
  -XX:+CMSParallelRemarkEnabled \
  -XX:+CMSIncrementalPacing \
  -XX:NewRatio=3 \
  -Xms30720M \
  -Xmx30720M \
  -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \
  -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \
  -Dcatalina.base=/usr/local/share/apache-tomcat \
  -Dcatalina.home=/usr/local/share/apache-tomcat \
  -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start
 
  I've tried a few GC option changes from this (been running this way
  for a couple of years now) - primarily removing CMS Incremental mode
  as we have 8 cores and

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Guido Medina


Did you add the Garbage collection JVM options I suggested you?

-XX:+UseG1GC -XX:MaxGCPauseMillis=50

Guido.

On 09/12/13 16:33, Patrick O'Lone wrote:

Unfortunately, in a test environment, this happens in version 4.4.0 of
Solr as well.


I was trying to locate the release notes for 3.6.x it is too old, if I
were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you
since it is a minor release, locate the release notes and see if
something that is affecting you got fixed, also, I would be thinking on
moving on to 4.x which is quite stable and fast.

Like anything with Java and concurrency, it will just get better (and
faster) with bigger numbers and concurrency frameworks becoming more and
more reliable, standard and stable.

Regards,

Guido.

On 09/12/13 15:07, Patrick O'Lone wrote:

I have a new question about this issue - I create a filter queries of
the form:

fq=start_time:[* TO NOW/5MINUTE]

This is used to restrict the set of documents to only items that have a
start time within the next 5 minutes. Most of my indexes have millions
of documents with few documents that start sometime in the future.
Nearly all of my queries include this, would this cause every other
search thread to block until the filter query is re-cached every 5
minutes and if so, is there a better way to do it? Thanks for any
continued help with this issue!


We have a webapp running with a very high HEAP size (24GB) and we have
no problems with it AFTER we enabled the new GC that is meant to replace
sometime in the future the CMS GC, but you have to have Java 6 update
Some number I couldn't find but latest should cover to be able to use:

1. Remove all GC options you have and...
2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/

As a test of course, more information you can read on the following (and
interesting) article, we also have Solr running with these options, no
more pauses or HEAP size hitting the sky.

Don't get bored reading the 1st (and small) introduction page of the
article, page 2 and 3 will make lot of sense:
http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061



HTH,

Guido.

On 26/11/13 21:59, Patrick O'Lone wrote:

We do perform a lot of sorting - on multiple fields in fact. We have
different kinds of Solr configurations - our news searches do little
with regards to faceting, but heavily sort. We provide classified ad
searches and that heavily uses faceting. I might try reducing the JVM
memory some and amount of perm generation as suggested earlier. It
feels
like a GC issue and loading the cache just happens to be the victim
of a
stop-the-world event at the worse possible time.


My gut instinct is that your heap size is way too high. Try
decreasing it to like 5-10G. I know you say it uses more than that,
but that just seems bizarre unless you're doing something like
faceting and/or sorting on every field.

-Michael

-Original Message-
From: Patrick O'Lone [mailto:pol...@townnews.com]
Sent: Tuesday, November 26, 2013 11:59 AM
To: solr-user@lucene.apache.org
Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache

I've been tracking a problem in our Solr environment for awhile with
periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
try and thought I might get some insight from some others on this
list.

The load on the server is normally anywhere between 1-3. It's an
8-core machine with 40GB of RAM. I have about 25GB of index data that
is replicated to this server every 5 minutes. It's taking about 200
connections per second and roughly every 5-10 minutes it will stall
for about 30 seconds to a minute. The stall causes the load to go to
as high as 90. It is all CPU bound in user space - all cores go to
99% utilization (spinlock?). When doing a thread dump, the following
line is blocked in all running Tomcat threads:

org.apache.lucene.search.FieldCacheImpl$Cache.get (
FieldCacheImpl.java:230 )

Looking the source code in 3.6.1, that is a function call to
syncronized() which blocks all threads and causes the backlog. I've
tried to correlate these events to the replication events - but even
with replication disabled - this still happens. We run multiple data
centers using Solr and I was comparing garbage collection processes
between and noted that the old generation is collected very
differently on this data center versus others. The old generation is
collected as a massive collect event (several gigabytes worth) - the
other data center is more saw toothed and collects only in 500MB-1GB
at a time. Here's my parameters to java (the same in all
environments):

/usr/java/jre/bin/java \
-verbose:gc \
-XX:+PrintGCDetails \
-server \
-Dcom.sun.management.jmxremote \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+CMSIncrementalMode \
-XX:+CMSParallelRemarkEnabled \
-XX:+CMSIncrementalPacing \
-XX:NewRatio=3 \
-Xms30720M \
-Xmx30720M \
-Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \
-classpath

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

Yeah, I tried G1, but it did not help - I don't think it is a garbage
collection issue. I've made various changes to iCMS as well and the
issue ALWAYS happens - no matter what I do. If I'm taking heavy traffic
(200 requests per second) - as soon as I hit a 5 minute mark - the world
stops - garbage collection would be less predictable. Nearly all of my
requests have this 5 minute windowing behavior on time though, which is
why I have it as a strong suspect now. If it blocks on that - even for a
couple of seconds, my traffic backlog will be 600-800 requests.

 Did you add the Garbage collection JVM options I suggested you?
 
 -XX:+UseG1GC -XX:MaxGCPauseMillis=50
 
 Guido.
 
 On 09/12/13 16:33, Patrick O'Lone wrote:
 Unfortunately, in a test environment, this happens in version 4.4.0 of
 Solr as well.

 I was trying to locate the release notes for 3.6.x it is too old, if I
 were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you
 since it is a minor release, locate the release notes and see if
 something that is affecting you got fixed, also, I would be thinking on
 moving on to 4.x which is quite stable and fast.

 Like anything with Java and concurrency, it will just get better (and
 faster) with bigger numbers and concurrency frameworks becoming more and
 more reliable, standard and stable.

 Regards,

 Guido.

 On 09/12/13 15:07, Patrick O'Lone wrote:
 I have a new question about this issue - I create a filter queries of
 the form:

 fq=start_time:[* TO NOW/5MINUTE]

 This is used to restrict the set of documents to only items that have a
 start time within the next 5 minutes. Most of my indexes have millions
 of documents with few documents that start sometime in the future.
 Nearly all of my queries include this, would this cause every other
 search thread to block until the filter query is re-cached every 5
 minutes and if so, is there a better way to do it? Thanks for any
 continued help with this issue!

 We have a webapp running with a very high HEAP size (24GB) and we have
 no problems with it AFTER we enabled the new GC that is meant to
 replace
 sometime in the future the CMS GC, but you have to have Java 6 update
 Some number I couldn't find but latest should cover to be able to
 use:

 1. Remove all GC options you have and...
 2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/

 As a test of course, more information you can read on the following
 (and
 interesting) article, we also have Solr running with these options, no
 more pauses or HEAP size hitting the sky.

 Don't get bored reading the 1st (and small) introduction page of the
 article, page 2 and 3 will make lot of sense:
 http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061




 HTH,

 Guido.

 On 26/11/13 21:59, Patrick O'Lone wrote:
 We do perform a lot of sorting - on multiple fields in fact. We have
 different kinds of Solr configurations - our news searches do little
 with regards to faceting, but heavily sort. We provide classified ad
 searches and that heavily uses faceting. I might try reducing the JVM
 memory some and amount of perm generation as suggested earlier. It
 feels
 like a GC issue and loading the cache just happens to be the victim
 of a
 stop-the-world event at the worse possible time.

 My gut instinct is that your heap size is way too high. Try
 decreasing it to like 5-10G. I know you say it uses more than that,
 but that just seems bizarre unless you're doing something like
 faceting and/or sorting on every field.

 -Michael

 -Original Message-
 From: Patrick O'Lone [mailto:pol...@townnews.com]
 Sent: Tuesday, November 26, 2013 11:59 AM
 To: solr-user@lucene.apache.org
 Subject: Solr 3.6.1 stalling with high CPU and blocking on field
 cache

 I've been tracking a problem in our Solr environment for awhile with
 periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
 try and thought I might get some insight from some others on this
 list.

 The load on the server is normally anywhere between 1-3. It's an
 8-core machine with 40GB of RAM. I have about 25GB of index data
 that
 is replicated to this server every 5 minutes. It's taking about 200
 connections per second and roughly every 5-10 minutes it will stall
 for about 30 seconds to a minute. The stall causes the load to go to
 as high as 90. It is all CPU bound in user space - all cores go to
 99% utilization (spinlock?). When doing a thread dump, the following
 line is blocked in all running Tomcat threads:

 org.apache.lucene.search.FieldCacheImpl$Cache.get (
 FieldCacheImpl.java:230 )

 Looking the source code in 3.6.1, that is a function call to
 syncronized() which blocks all threads and causes the backlog. I've
 tried to correlate these events to the replication events - but even
 with replication disabled - this still happens. We run multiple data
 centers using Solr and I was comparing garbage collection processes
 between and noted that the old generation is collected very

Re: Indexing on plain text and binary data in a single HTTP POST request

2013-12-09 Thread Shawn Heisey


On 12/9/2013 9:20 AM, neerajp wrote:

I tried to use ExtractingUpdateProcessor but soon came to know that the same
is not rolled out in solr 4.5
I am not sure how to use ExtractingRequestHandler for an XML document having
some of the fields in plain text and some of the fields in random binary
format. It seems to me that ExtractingRequestHandler is used to extract text
from a binary file input but my input document is in XML format not binary.


ExtractingRequestHandler is a contrib module.  It's not included in the 
Solr application war itself, but it IS in the download.  You can find 
the jars in contrib/extraction/lib in all 4.x versions, including 4.5, 
4.5.1, and 4.6.


Thanks,
Shawn

Re: JVM crashed when start solr

2013-12-09 Thread Wukang Lin

Hi michael,
Thank you for  you response. I start solr with follow command line:
java -Xms10240m -Xmx20480m -Dbootstrap_confdir=./solr/conf
-Dcollection.configName=myconf -DzkRun -DzkHost=node4:9983 -DnumShards=3
-jar start.jar
It doesn't work any more. the solr server crashed when the memory usage
of the server raise up to 5G.


2013/12/10 michael.boom my_sky...@yahoo.com

 Which are you solr startup parameters (java options) ?
 You can assign more memory to the JVM by specifying -Xmx=10G or whichever
 value works for you.



 -
 Thanks,
 Michael
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/JVM-crashed-when-start-solr-tp4105702p4105705.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Bad fieldNorm when using morphologic synonyms

2013-12-09 Thread Manuel Le Normand

In order to set discountOverlaps to true you must have added the
similarity class=solr.DefaultSimilarityFactory to the schema.xml, which
is commented out by default!

As by default this param is false, the above situation is expected with
correct positioning, as said.

In order to fix the field norms you'd have to reindex with the similarity
class which initializes the param to true.

Cheers,
Manu

RE: JVM crashed when start solr

2013-12-09 Thread Boogie Shafer

you may want to start by updating both your solr and JVM to more recent 
releases. looks like you are running solr 4.3.0 and java 6 u31 in your trace. 

i would suggest trying with solr 4.5.1 and java 7 u45.

From: Wukang Lin vboylin1...@gmail.com
Sent: Monday, December 09, 2013 09:19
To: solr-user@lucene.apache.org
Subject: Re: JVM crashed when start solr

Hi michael,
Thank you for  you response. I start solr with follow command line:
java -Xms10240m -Xmx20480m -Dbootstrap_confdir=./solr/conf
-Dcollection.configName=myconf -DzkRun -DzkHost=node4:9983 -DnumShards=3
-jar start.jar
It doesn't work any more. the solr server crashed when the memory usage
of the server raise up to 5G.

2013/12/10 michael.boom my_sky...@yahoo.com

 Which are you solr startup parameters (java options) ?
 You can assign more memory to the JVM by specifying -Xmx=10G or whichever
 value works for you.

 -
 Thanks,
 Michael
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/JVM-crashed-when-start-solr-tp4105702p4105705.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

Well, I want to include everything will start in the next 5 minute
interval and everything that came before. The query is more like:

fq=start_time:[* TO NOW+5MINUTE/5MINUTE]

so that it rounds to the nearest 5 minute interval on the right-hand
side. But, as soon as 1 second after that 5 minute window, everything
pauses wanting for filter cache (at least that's my working theory based
on observation). Is it possible to do something like:

fq=start_time:[* TO NOW+1DAY/DAY]q=start_time:[* TO NOW/MINUTE]

where it would use the filter cache to narrow down by day resolution and
then filter as part of the standard query, or something like that?

My thought is that this would still gain a benefit from a query cache,
but somewhat slower since it must remove results for things appearing
later in the day.

 If you want a start time within the next 5 minutes, I think your filter
 is not the good one.
 * will be replaced by the first date in your field
 
 Try :
 fq=start_time:[NOW TO NOW+5MINUTE]
 
 Franck Brisbart
 
 
 Le lundi 09 décembre 2013 à 09:07 -0600, Patrick O'Lone a écrit :
 I have a new question about this issue - I create a filter queries of
 the form:

 fq=start_time:[* TO NOW/5MINUTE]

 This is used to restrict the set of documents to only items that have a
 start time within the next 5 minutes. Most of my indexes have millions
 of documents with few documents that start sometime in the future.
 Nearly all of my queries include this, would this cause every other
 search thread to block until the filter query is re-cached every 5
 minutes and if so, is there a better way to do it? Thanks for any
 continued help with this issue!

 We have a webapp running with a very high HEAP size (24GB) and we have
 no problems with it AFTER we enabled the new GC that is meant to replace
 sometime in the future the CMS GC, but you have to have Java 6 update
 Some number I couldn't find but latest should cover to be able to use:

 1. Remove all GC options you have and...
 2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/

 As a test of course, more information you can read on the following (and
 interesting) article, we also have Solr running with these options, no
 more pauses or HEAP size hitting the sky.

 Don't get bored reading the 1st (and small) introduction page of the
 article, page 2 and 3 will make lot of sense:
 http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061


 HTH,

 Guido.

 On 26/11/13 21:59, Patrick O'Lone wrote:
 We do perform a lot of sorting - on multiple fields in fact. We have
 different kinds of Solr configurations - our news searches do little
 with regards to faceting, but heavily sort. We provide classified ad
 searches and that heavily uses faceting. I might try reducing the JVM
 memory some and amount of perm generation as suggested earlier. It feels
 like a GC issue and loading the cache just happens to be the victim of a
 stop-the-world event at the worse possible time.

 My gut instinct is that your heap size is way too high. Try
 decreasing it to like 5-10G. I know you say it uses more than that,
 but that just seems bizarre unless you're doing something like
 faceting and/or sorting on every field.

 -Michael

 -Original Message-
 From: Patrick O'Lone [mailto:pol...@townnews.com]
 Sent: Tuesday, November 26, 2013 11:59 AM
 To: solr-user@lucene.apache.org
 Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache

 I've been tracking a problem in our Solr environment for awhile with
 periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
 try and thought I might get some insight from some others on this list.

 The load on the server is normally anywhere between 1-3. It's an
 8-core machine with 40GB of RAM. I have about 25GB of index data that
 is replicated to this server every 5 minutes. It's taking about 200
 connections per second and roughly every 5-10 minutes it will stall
 for about 30 seconds to a minute. The stall causes the load to go to
 as high as 90. It is all CPU bound in user space - all cores go to
 99% utilization (spinlock?). When doing a thread dump, the following
 line is blocked in all running Tomcat threads:

 org.apache.lucene.search.FieldCacheImpl$Cache.get (
 FieldCacheImpl.java:230 )

 Looking the source code in 3.6.1, that is a function call to
 syncronized() which blocks all threads and causes the backlog. I've
 tried to correlate these events to the replication events - but even
 with replication disabled - this still happens. We run multiple data
 centers using Solr and I was comparing garbage collection processes
 between and noted that the old generation is collected very
 differently on this data center versus others. The old generation is
 collected as a massive collect event (several gigabytes worth) - the
 other data center is more saw toothed and collects only in 500MB-1GB
 at a time. Here's my parameters to java (the same in all environments):

 /usr/java/jre/bin/java \

Re: [Solr Wiki] Your wiki account data


: Is this email address still valid?
: 
: Kind Regards

Mehdi: i don't understand your question, particularly in the context of 
the thread you are replying to.

On Dec 4, you asked if your wiki id (madeinch) could be added to the 
editing group for the solr wiki, and Erick Erickson replied on the same 
day that he did that.

You now have the ability to edit the wiki using that wiki account, but if 
you are having problems loging into that account that may be a seperate 
problem?  (It's not clear what you were asking about when you forwarded 
the password recovery email bellow)


: 2013/12/4 Mehdi Burgy gla...@gmail.com
: 
:  Hello,
: 
:  We've recently launched a job search engine using Solr, and would like to
:  add it here: https://wiki.apache.org/solr/PublicServers
: 
:  Would it be possible to allow me be part of the publishing group?
: 
:  Thank you for your help
: 
:  Kind Regards,
: 
:  Mehdi Burgy
:   New Job Search Engine:
:  www.jobreez.com
: 
:  -- Forwarded message --
:  From: Apache Wiki wikidi...@apache.org
:  Date: 2013/12/4
:  Subject: [Solr Wiki] Your wiki account data
:  To: Apache Wiki wikidi...@apache.org
: 
: 
: 
:  Somebody has requested to email you a password recovery token.
: 
:  If you lost your password, please go to the password reset URL below or
:  go to the password recovery page again and enter your username and the
:  recovery token.
: 
:  Login Name: madeinch
: 
: 
: 
: 

-Hoss
http://www.lucidworks.com/

Displaying actual field values and searching lowercase ignoring spaces

2013-12-09 Thread PeterKerk

Values of the field [street] in my DB may be Castle Road

However, I want to be able to find these values using lowercase including
dashes, so castle-road would be a match.

When I use fieldtype text_lower_space, which holds a
solr.WhitespaceTokenizerFactory, the value is split in 2 values, Castle
and Road. 

When I use type string of fieldtype solr.StrField, I can not search
lowercase and still find values which hold uppercase characters, such as
Castle Road.

I need to be able to find values (regardless of their casing) using a
lowercase query.

I will be using the [street] field to display facets, so the text displayed
to the user should be the exact value including casing from field [street],
however, when I search on the field, castle-road should return a match.

original value  found on
Castle Road castle-road
Oak-tree lane   oak-tree-lane


The problem now is that I don't know which tokenizer I need to use, both for
index and query.


fieldType name=text_lower_space class=solr.TextField
positionIncrementGap=100
  analyzer type=index

tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query

tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Displaying-actual-field-values-and-searching-lowercase-ignoring-spaces-tp4105723.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: JVM crashed when start solr

2013-12-09 Thread Shawn Heisey


On 12/9/2013 10:29 AM, Boogie Shafer wrote:

you may want to start by updating both your solr and JVM to more recent 
releases. looks like you are running solr 4.3.0 and java 6 u31 in your trace.

i would suggest trying with solr 4.5.1 and java 7 u45.


There are bugs in Java 7 which make using 7u40 and 7u45 problematic.  
The 7u25 version works OK.  Here's an issue that mentions 7u40, but it's 
still an issue with 7u45.


https://issues.apache.org/jira/browse/LUCENE-5212

This bug has been fixed and should be in 7u60 when that gets released.

https://bugs.openjdk.java.net/browse/JDK-8024830

I thought there was another issue specific for 7u45, but I can't seem to 
locate it.


Thanks,
Shawn

Re: passing SYS_REFCURSOR as out parameter for Oracle stored procedure

2013-12-09 Thread Michael Della Bitta

I would probably do something like create a function that called your
stored procedure and returned the function, and then call TABLE() on the
result of your function so that DataImportHandler gets something that looks
like a table to it. I'm not sure that DataImportHandler is set up to deal
with cursors or out parameters.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Fri, Dec 6, 2013 at 5:18 AM, aniljayanti aniljaya...@yahoo.co.in wrote:

 Hi,

 I am using solr 3.3 for index generation with sql server, generating index
 successfully, now I am trying to generate with Oracle DB.  I am using
 *UDP_Getdetails* procedure to generate the required indexes. In this
 procedure its taking 2 inputs and 1 output parameters.

 *input params :
 id
 name

 output params :
 cv_1 IN OUT SYS_REFCURSOR*

 In solr, data-config.xml below is my configuration.

 *entity name=index query=UDP_Getdetails(32,'GT',  ); *

 I donot know how to pass *SYS_REFCURSOR* to procedure in solr.

 Please help me out of this.

 Thanks in Advance,

 Aniljayanti



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/passing-SYS-REFCURSOR-as-out-parameter-for-Oracle-stored-procedure-tp4105307.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: ANNOUNCE: Apache Solr Reference Guide 4.6


: But it still has the error about TrimFilterFactory in it, which I reported a 
couple of days back.

Bernd, thanks for reporting this -- I did not notice your email when you 
initially sent it, but it was after the vote for hte RC began anyway, and 
was not brought up in the VOTE thread as a blocker.

I've updated the docs to fix this...
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions

In the future, if you have comments/suggestions about doc improvements, 
please post them as comments in the ref guide -- that not only makes 
them directly accessible by people reviewing the online copy, but 
also helps them stand out better when folks are reviewing the docs 
for bugs just prior to release.

thanks again for catching this.

-Hoss
http://www.lucidworks.com/

RE: JVM crashed when start solr

2013-12-09 Thread Boogie Shafer

aah good to know. i hadn't seen any issues on our solr 4.5.1 setups with 7u45 
yet but perhaps we've just been lucky so far.

From: Shawn Heisey s...@elyograg.org
Sent: Monday, December 09, 2013 09:46
To: solr-user@lucene.apache.org
Subject: Re: JVM crashed when start solr

On 12/9/2013 10:29 AM, Boogie Shafer wrote:
 you may want to start by updating both your solr and JVM to more recent 
 releases. looks like you are running solr 4.3.0 and java 6 u31 in your trace.

 i would suggest trying with solr 4.5.1 and java 7 u45.

There are bugs in Java 7 which make using 7u40 and 7u45 problematic.
The 7u25 version works OK.  Here's an issue that mentions 7u40, but it's
still an issue with 7u45.

https://issues.apache.org/jira/browse/LUCENE-5212

This bug has been fixed and should be in 7u60 when that gets released.

https://bugs.openjdk.java.net/browse/JDK-8024830

I thought there was another issue specific for 7u45, but I can't seem to
locate it.

Thanks,
Shawn

Re: Indexing on plain text and binary data in a single HTTP POST request

2013-12-09 Thread Raymond Wiker


On 09 Dec 2013, at 17:20 , neerajp neeraj_star2...@yahoo.com wrote:

 
 2) Your binary content is encoded in some way inside XML, right? Not just 
 random binary, which would make it invalid XML? Like base64 or something? 
 
 [Neeraj]: I want to use random binary(*not base64 encoded*) in some of the
 XML fields inside CDATA tag so that XML will not become invalid. I hope I
 can do this. 

You can't – there are binary values that are simply not acceptable in an XML 
stream. Encoding the binary is the canonical way around this.

That said, the obvious alternative is to use /update/extract instead of /update 
– this gives you a way of handling up to one binary stream in addition to any 
number of fields that can be represented as text. In that case, you need to 
construct a POST request that sends the binary content as a file stream, and 
the other parameters as ordinary form data (actually, it may be possible to 
send some/all of the other fields as url parameters, but that does not really 
simplify things).

Re: ANNOUNCE: Apache Solr Reference Guide 4.6

2013-12-09 Thread Fred at Zimzaz

Can we please give some thought to producing these manuals in ebook formats?


On Mon, Dec 2, 2013 at 12:28 PM, Chris Hostetter hoss...@apache.org wrote:


 The Lucene PMC is pleased to announce the release of the Apache Solr
 Reference Guide for Solr 4.6.

 This 347 page PDF serves as the definitive users manual for Solr 4.6.

 The Solr Reference Guide is available for download from the Apache mirror
 network:

   https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/

 (If you have followup questions, please send them only to
 solr-user@lucene.apache.org)

 -Hoss

Re: Getting Solr Document Attributes from a Custom Function


Smells like an XY problem ...

Can you please describe what your end goal is in writing a custom 
function, and what you would do with things like the name field inside 
your funciton?

In general, accessing stored field values for indexed documents ca be 
prohibitively expensive, it rather defeats the entire point of the 
inverted index data structure.  If you help us understand what your goal 
is, people may be able to offer performant suggestions.



https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




: Date: Mon, 9 Dec 2013 20:24:15 +0530
: From: Mukundaraman valakumaresan muk...@8kmiles.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Getting Solr Document Attributes from a Custom Function
: 
: Hi All,
: 
: I have a written a custom solr function and I would like to read a property
: of the document inside my custom function. Is it possible to get that using
: Solr?
: 
: For eg. inside the floatVal method, I would like to get the value of the
: attribute name
: 
: public class CustomValueSource extends ValueSource {
: 
: @Override
: public FunctionValues getValues(Map context,
: AtomicReaderContext readerContext) throws IOException {
:  return new FloatDocValues(this) { @Override public float floatVal(int doc)
: {
: /***
:  getDocument(doc).getAttribute(name)
: 
: / }}}
: 
: Thanks  Regards
: Mukund
: 

-Hoss
http://www.lucidworks.com/

Re: ANNOUNCE: Apache Solr Reference Guide 4.6


: Can we please give some thought to producing these manuals in ebook formats?

People have given it thought, but it's not as simple as just snapping our 
fingers and making it happen.

If you would like to contibute to the effort of figuring out the
how/where/what to make this happening, there is an existing jira for 
dicussing it.

https://issues.apache.org/jira/browse/SOLR-5467



-Hoss
http://www.lucidworks.com/

Re: Bad fieldNorm when using morphologic synonyms

2013-12-09 Thread Robert Muir

no, its turned on by default in the default similarity.

as i said, all that is necessary is to fix your analyzer to emit the
proper position increments.

On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
 In order to set discountOverlaps to true you must have added the
 similarity class=solr.DefaultSimilarityFactory to the schema.xml, which
 is commented out by default!

 As by default this param is false, the above situation is expected with
 correct positioning, as said.

 In order to fix the field norms you'd have to reindex with the similarity
 class which initializes the param to true.

 Cheers,
 Manu

Re: Bad fieldNorm when using morphologic synonyms

Hi Robert and Manuel.

The DefaultSimilarity indeed sets discountOverlap to true by default.
BUT, the *factory*, aka DefaultSimilarityFactory, when called by
IndexSchema (the getSimilarity method), explicitly sets this value to the
value of its corresponding class member.
This class member is initialized to be FALSE  when the instance is created
(like every boolean variable in the world). It should be set when init
method is called. If the parameter is not set in schema.xml, the default is
true.

Everything seems to be alright, but the issue is that init method is NOT
called, if the similarity is not *explicitly* declared in schema.xml. In
that case, init method is not called, the discountOverlaps member (of the
factory class) remains FALSE, and getSimilarity explicitly calls
setDiscountOverlaps with value of FALSE.

This is very easy to reproduce and debug.


On Mon, Dec 9, 2013 at 9:19 PM, Robert Muir rcm...@gmail.com wrote:

 no, its turned on by default in the default similarity.

 as i said, all that is necessary is to fix your analyzer to emit the
 proper position increments.

 On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand
 manuel.lenorm...@gmail.com wrote:
  In order to set discountOverlaps to true you must have added the
  similarity class=solr.DefaultSimilarityFactory to the schema.xml,
 which
  is commented out by default!
 
  As by default this param is false, the above situation is expected with
  correct positioning, as said.
 
  In order to fix the field norms you'd have to reindex with the similarity
  class which initializes the param to true.
 
  Cheers,
  Manu

Re: ANNOUNCE: Apache Solr Reference Guide 4.6

2013-12-09 Thread Ing. Jorge Luis Betancourt Gonzalez

Is it possible to export the doc into markdown? 

- Mensaje original -
De: Chris Hostetter hossman_luc...@fucit.org
Para: solr-user@lucene.apache.org
Enviados: Lunes, 9 de Diciembre 2013 14:00:34
Asunto: Re: ANNOUNCE: Apache Solr Reference Guide 4.6


: Can we please give some thought to producing these manuals in ebook formats?

People have given it thought, but it's not as simple as just snapping our 
fingers and making it happen.

If you would like to contibute to the effort of figuring out the
how/where/what to make this happening, there is an existing jira for 
dicussing it.

https://issues.apache.org/jira/browse/SOLR-5467



-Hoss
http://www.lucidworks.com/

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

Re: LocalParam for nested query without escaping?

If so, can someone suggest how a query should be escaped (securely and
correctly)?
Should I escape the quote mark (and backslash mark itself) only?


On Fri, Dec 6, 2013 at 2:59 PM, Isaac Hebsh isaac.he...@gmail.com wrote:

 Obviously, there is the option of external parameter ({...
 v=$nestedq}nestedq=...)

 This is a good solution, but it is not practical, when having a lot of
 such nested queries.

 Any ideas?

 On Friday, December 6, 2013, Isaac Hebsh wrote:

 We want to set a LocalParam on a nested query. When quering with v
 inline parameter, it works fine:

 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1AND
  {!lucene df=text v=TERM2 TERM3 \TERM4 TERM5\}

 the parsedquery_toString is
 +id:TERM1 +(text:term2 text:term3 text:term4 term5)

 Query using the _query_ also works fine:

 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1AND
  _query_:{!lucene df=text}TERM2 TERM3 \TERM4 TERM5\

 (parsedquery is exactly the same).


 BUT, when trying to put the nested query in place, it yields syntax error:

 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1AND
  {!lucene df=text}(TERM2 TERM3 TERM4 TERM5)

 org.apache.solr.search.SyntaxError: Cannot parse '(TERM2'

 The previous options are less preferred, because the escaping that should
 be made on the nested query.

 Can't I set a LocalParam to a nested query without escaping the query?

Re: Global query parameters to facet query

created SOLR-5542.
Anyone else want it?


On Thu, Dec 5, 2013 at 8:55 PM, Isaac Hebsh isaac.he...@gmail.com wrote:

 Hi,

 It seems that a facet query does not use the global query parameters (for
 example, field aliasing for edismax parser).
 We have an intensive use of facet queries (in some cases, we have a lot of
 facet.query for a single q), and the using of LocalParams for each
 facet.query is not convenient.

 Did I miss a normal way to solve it?
 Did anyone else encountered this requirement?

Re: Bad fieldNorm when using morphologic synonyms

2013-12-09 Thread Roman Chyla

Isaac, is there an easy way to recognize this problem? We also index
synonym tokens in the same position (like you do, and I'm sure that our
positions are set correctly). I could test whether the default similarity
factory in solrconfig.xml had any effect (before/after reindexing).

--roman


On Mon, Dec 9, 2013 at 2:42 PM, Isaac Hebsh isaac.he...@gmail.com wrote:

 Hi Robert and Manuel.

 The DefaultSimilarity indeed sets discountOverlap to true by default.
 BUT, the *factory*, aka DefaultSimilarityFactory, when called by
 IndexSchema (the getSimilarity method), explicitly sets this value to the
 value of its corresponding class member.
 This class member is initialized to be FALSE  when the instance is created
 (like every boolean variable in the world). It should be set when init
 method is called. If the parameter is not set in schema.xml, the default is
 true.

 Everything seems to be alright, but the issue is that init method is NOT
 called, if the similarity is not *explicitly* declared in schema.xml. In
 that case, init method is not called, the discountOverlaps member (of the
 factory class) remains FALSE, and getSimilarity explicitly calls
 setDiscountOverlaps with value of FALSE.

 This is very easy to reproduce and debug.


 On Mon, Dec 9, 2013 at 9:19 PM, Robert Muir rcm...@gmail.com wrote:

  no, its turned on by default in the default similarity.
 
  as i said, all that is necessary is to fix your analyzer to emit the
  proper position increments.
 
  On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand
  manuel.lenorm...@gmail.com wrote:
   In order to set discountOverlaps to true you must have added the
   similarity class=solr.DefaultSimilarityFactory to the schema.xml,
  which
   is commented out by default!
  
   As by default this param is false, the above situation is expected with
   correct positioning, as said.
  
   In order to fix the field norms you'd have to reindex with the
 similarity
   class which initializes the param to true.
  
   Cheers,
   Manu

Replicating from the correct collections in SolrCloud on solr start

2013-12-09 Thread cwhi

I have a Solr configuration that I am trying to replicate on several machines
as part of a package installation.  I have a cluster of machines that will
run the SolrCloud, with 3 machines in the cluster running a zookeeper
ensemble.  As part of the installation of each machine, Solr is started with
the desired configuration uploaded (java
-Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf
-DzkHost=ipaddress1:2181,ipaddress2:2181,ipaddress3:2181 -jar start.jar).

My problem is that when I add a new machine to my SolrCloud cluster, I
expect it to replicate data from the collections I have in SolrCloud.  This
doesn't appear to be happening.  Instead, each new machine just replicates
the default collection1 collection.  I'd added the collection in question
with this command:

http://localhost:8983/solr/admin/collections?action=CREATEname=SolrCloudTestnumShards=1replicationFactor=2collection.configName=myconf

So my question is simple: Why is it that when I start a new Solr instance on
the same zookeeper ensemble, it does not replicate the data from the
SolrCloudTest collection, and instead only replicates collection1?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replicating-from-the-correct-collections-in-SolrCloud-on-solr-start-tp4105754.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Bad fieldNorm when using morphologic synonyms

You can see the norm value, in the explain text, when setting
debugQuery=true.
If the same item gets different norm before/after, that's it.

Note that this configuration is in schema.xml (not solrconfig.xml...)

On Monday, December 9, 2013, Roman Chyla wrote:

 Isaac, is there an easy way to recognize this problem? We also index
 synonym tokens in the same position (like you do, and I'm sure that our
 positions are set correctly). I could test whether the default similarity
 factory in solrconfig.xml had any effect (before/after reindexing).

 --roman


 On Mon, Dec 9, 2013 at 2:42 PM, Isaac Hebsh 
 isaac.he...@gmail.comjavascript:;
 wrote:

  Hi Robert and Manuel.
 
  The DefaultSimilarity indeed sets discountOverlap to true by default.
  BUT, the *factory*, aka DefaultSimilarityFactory, when called by
  IndexSchema (the getSimilarity method), explicitly sets this value to the
  value of its corresponding class member.
  This class member is initialized to be FALSE  when the instance is
 created
  (like every boolean variable in the world). It should be set when init
  method is called. If the parameter is not set in schema.xml, the default
 is
  true.
 
  Everything seems to be alright, but the issue is that init method is
 NOT
  called, if the similarity is not *explicitly* declared in schema.xml. In
  that case, init method is not called, the discountOverlaps member (of the
  factory class) remains FALSE, and getSimilarity explicitly calls
  setDiscountOverlaps with value of FALSE.
 
  This is very easy to reproduce and debug.
 
 
  On Mon, Dec 9, 2013 at 9:19 PM, Robert Muir rcm...@gmail.comjavascript:;
 wrote:
 
   no, its turned on by default in the default similarity.
  
   as i said, all that is necessary is to fix your analyzer to emit the
   proper position increments.
  
   On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand
   manuel.lenorm...@gmail.com javascript:; wrote:
In order to set discountOverlaps to true you must have added the
similarity class=solr.DefaultSimilarityFactory to the schema.xml,
   which
is commented out by default!
   
As by default this param is false, the above situation is expected
 with
correct positioning, as said.
   
In order to fix the field norms you'd have to reindex with the
  similarity
class which initializes the param to true.
   
Cheers,
Manu

Re: JVM crashed when start solr

2013-12-09 Thread Guido Medina


And it was only reproduced with JVM 32 bits, not 64 bits.

Guido.

On 09/12/13 17:46, Shawn Heisey wrote:

On 12/9/2013 10:29 AM, Boogie Shafer wrote:
you may want to start by updating both your solr and JVM to more 
recent releases. looks like you are running solr 4.3.0 and java 6 u31 
in your trace.


i would suggest trying with solr 4.5.1 and java 7 u45.


There are bugs in Java 7 which make using 7u40 and 7u45 problematic.  
The 7u25 version works OK.  Here's an issue that mentions 7u40, but 
it's still an issue with 7u45.


https://issues.apache.org/jira/browse/LUCENE-5212

This bug has been fixed and should be in 7u60 when that gets released.

https://bugs.openjdk.java.net/browse/JDK-8024830

I thought there was another issue specific for 7u45, but I can't seem 
to locate it.


Thanks,
Shawn

Use of Deprecated Classes: SortableIntField SortableFloatField SortableDoubleField

2013-12-09 Thread O. Olson



Use of Deprecated Classes: SortableIntField SortableFloatField
SortableDoubleField
 
I am attempting to migrate from Solr 4.3 to Solr 4.6. When I
run the example in 4.6, I get warnings SortableIntField etc. asking me to
consult the documentation to replace them accordingly. 
 
If these classes are deprecated, I think it would not be a
good idea to use them in the examples as in: 
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_6/solr/example/example-DIH/solr/db/conf/schema.xml
 Here, weight, price and popularity seem to use the deprecated sfloat and sint. 
 
Does anyone know where I can find documentation to replace
these classes in my schema file. Thank you,
O. O.

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Joel Bernstein

Patrick,

Are you getting these stalls following a commit? If so then the issue is
most likely fieldCache warming pauses. To stop your users from seeing this
pause you'll need to add static warming queries to your solrconfig.xml to
warm the fieldCache before it's registered .


On Mon, Dec 9, 2013 at 12:33 PM, Patrick O'Lone pol...@townnews.com wrote:

 Well, I want to include everything will start in the next 5 minute
 interval and everything that came before. The query is more like:

 fq=start_time:[* TO NOW+5MINUTE/5MINUTE]

 so that it rounds to the nearest 5 minute interval on the right-hand
 side. But, as soon as 1 second after that 5 minute window, everything
 pauses wanting for filter cache (at least that's my working theory based
 on observation). Is it possible to do something like:

 fq=start_time:[* TO NOW+1DAY/DAY]q=start_time:[* TO NOW/MINUTE]

 where it would use the filter cache to narrow down by day resolution and
 then filter as part of the standard query, or something like that?

 My thought is that this would still gain a benefit from a query cache,
 but somewhat slower since it must remove results for things appearing
 later in the day.

  If you want a start time within the next 5 minutes, I think your filter
  is not the good one.
  * will be replaced by the first date in your field
 
  Try :
  fq=start_time:[NOW TO NOW+5MINUTE]
 
  Franck Brisbart
 
 
  Le lundi 09 décembre 2013 à 09:07 -0600, Patrick O'Lone a écrit :
  I have a new question about this issue - I create a filter queries of
  the form:
 
  fq=start_time:[* TO NOW/5MINUTE]
 
  This is used to restrict the set of documents to only items that have a
  start time within the next 5 minutes. Most of my indexes have millions
  of documents with few documents that start sometime in the future.
  Nearly all of my queries include this, would this cause every other
  search thread to block until the filter query is re-cached every 5
  minutes and if so, is there a better way to do it? Thanks for any
  continued help with this issue!
 
  We have a webapp running with a very high HEAP size (24GB) and we have
  no problems with it AFTER we enabled the new GC that is meant to
 replace
  sometime in the future the CMS GC, but you have to have Java 6 update
  Some number I couldn't find but latest should cover to be able to
 use:
 
  1. Remove all GC options you have and...
  2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/
 
  As a test of course, more information you can read on the following
 (and
  interesting) article, we also have Solr running with these options, no
  more pauses or HEAP size hitting the sky.
 
  Don't get bored reading the 1st (and small) introduction page of the
  article, page 2 and 3 will make lot of sense:
 
 http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
 
 
  HTH,
 
  Guido.
 
  On 26/11/13 21:59, Patrick O'Lone wrote:
  We do perform a lot of sorting - on multiple fields in fact. We have
  different kinds of Solr configurations - our news searches do little
  with regards to faceting, but heavily sort. We provide classified ad
  searches and that heavily uses faceting. I might try reducing the JVM
  memory some and amount of perm generation as suggested earlier. It
 feels
  like a GC issue and loading the cache just happens to be the victim
 of a
  stop-the-world event at the worse possible time.
 
  My gut instinct is that your heap size is way too high. Try
  decreasing it to like 5-10G. I know you say it uses more than that,
  but that just seems bizarre unless you're doing something like
  faceting and/or sorting on every field.
 
  -Michael
 
  -Original Message-
  From: Patrick O'Lone [mailto:pol...@townnews.com]
  Sent: Tuesday, November 26, 2013 11:59 AM
  To: solr-user@lucene.apache.org
  Subject: Solr 3.6.1 stalling with high CPU and blocking on field
 cache
 
  I've been tracking a problem in our Solr environment for awhile with
  periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
  try and thought I might get some insight from some others on this
 list.
 
  The load on the server is normally anywhere between 1-3. It's an
  8-core machine with 40GB of RAM. I have about 25GB of index data that
  is replicated to this server every 5 minutes. It's taking about 200
  connections per second and roughly every 5-10 minutes it will stall
  for about 30 seconds to a minute. The stall causes the load to go to
  as high as 90. It is all CPU bound in user space - all cores go to
  99% utilization (spinlock?). When doing a thread dump, the following
  line is blocked in all running Tomcat threads:
 
  org.apache.lucene.search.FieldCacheImpl$Cache.get (
  FieldCacheImpl.java:230 )
 
  Looking the source code in 3.6.1, that is a function call to
  syncronized() which blocks all threads and causes the backlog. I've
  tried to correlate these events to the replication events - but even
  with replication disabled - this still happens. We run

Re: Use of Deprecated Classes: SortableIntField SortableFloatField SortableDoubleField

2013-12-09 Thread Kydryavtsev Andrey

Javadoc for this deprecated classes suggest to use TrieIntField, TrieFloatField 
and TrieDoubleField respectively instead

10.12.2013, 01:19, O. Olson olson_...@yahoo.it:
 Use of Deprecated Classes: SortableIntField SortableFloatField
 SortableDoubleField

 I am attempting to migrate from Solr 4.3 to Solr 4.6. When I
 run the example in 4.6, I get warnings SortableIntField etc. asking me to
 consult the documentation to replace them accordingly.

 If these classes are deprecated, I think it would not be a
 good idea to use them in the examples as in: 
 http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_6/solr/example/example-DIH/solr/db/conf/schema.xml
  Here, weight, price and popularity seem to use the deprecated sfloat and 
 sint.

 Does anyone know where I can find documentation to replace
 these classes in my schema file. Thank you,
 O. O.

Re: Replicating from the correct collections in SolrCloud on solr start

2013-12-09 Thread Mark Miller

This is currently as designed / expected.

The reason that collection is replicated is because it's configured by
default in a default Solr install.

When you use the collections API, it only takes into account the current
nodes.

Eventually, there will be a mode where the Overseer will create/remove
SolrCore's based on the replicationFactor, etc as you add and remove nodes,
but that is not yet supported.

If you add a node after the fact and want a replica on it, you have to
preconfigure the SolrCore as is done with collection1 before starting the
node, or use the Core Admin API to add the new SolrCore and make sure it's
collection param matches the collection you want to add it too and the
shard param matches the shard you want to add it to.

On Mon, Dec 9, 2013 at 12:40 PM, cwhi chris.whi...@gmail.com wrote:

I have a Solr configuration that I am trying to replicate on several
machines
as part of a package installation. I have a cluster of machines that will
run the SolrCloud, with 3 machines in the cluster running a zookeeper
ensemble. As part of the installation of each machine, Solr is started
with
the desired configuration uploaded (java
-Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf
-DzkHost=ipaddress1:2181,ipaddress2:2181,ipaddress3:2181 -jar start.jar).

My problem is that when I add a new machine to my SolrCloud cluster, I
expect it to replicate data from the collections I have in SolrCloud. This
doesn't appear to be happening. Instead, each new machine just replicates
the default collection1 collection. I'd added the collection in question
with this command:

http://localhost:8983/solr/admin/collections?action=CREATEname=SolrCloudTestnumShards=1replicationFactor=2collection.configName=myconf

So my question is simple: Why is it that when I start a new Solr instance
on
the same zookeeper ensemble, it does not replicate the data from the
SolrCloudTest collection, and instead only replicates collection1?

--
View this message in context:
http://lucene.472066.n3.nabble.com/Replicating-from-the-correct-collections-in-SolrCloud-on-solr-start-tp4105754.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
- Mark

Re: solr.xml

2013-12-09 Thread Mark Miller

Sounds like a bug. If you are seeing this happen in 4.6, I'd file a JIRA
issue.

- Mark


On Sun, Dec 8, 2013 at 3:49 PM, William Bell billnb...@gmail.com wrote:

 Any thoughts? Why are we getting duplicate items in solr.xml ?

 -- Forwarded message --
 From: William Bell billnb...@gmail.com
 Date: Sat, Dec 7, 2013 at 1:48 PM
 Subject: solr.xml
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org


 We are having issues with SWAP CoreAdmin in 4.5.1 and 4.6.

 Using legacy solr.xml we issue a SWAP, and we want it persistent. It has
 bee running flawless since 4.5. Now it creates duplicate lines in solr.xml.

 Even the example multi core schema in 4.5.1 doesn't work with
 persistent=true - it creates duplicate lines in solr.xml.

  cores adminPath=/admin/cores
 core name=autosuggest loadOnStartup=true instanceDir=autosuggest
 transient=false/

 core name=citystateprovider loadOnStartup=true
 instanceDir=citystateprovider transient=false/

 core name=collection1 loadOnStartup=true instanceDir=collection1
 transient=false/

 core name=facility loadOnStartup=true instanceDir=facility
 transient=false/

 core name=inactiveproviders loadOnStartup=true
 instanceDir=inactiveproviders transient=false/

 core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true
 transient=false/

 core name=linesvcgeofull instanceDir=linesvcgeofull
 loadOnStartup=true transient=false/

 core name=locationgeo loadOnStartup=true instanceDir=locationgeo
 transient=false/

 core name=market loadOnStartup=true instanceDir=market
 transient=false/

 core name=portalprovider loadOnStartup=true
 instanceDir=portalprovider transient=false/

 core name=practice loadOnStartup=true instanceDir=practice
 transient=false/

 core name=provider loadOnStartup=true instanceDir=provider
 transient=false/

 core name=providersearch loadOnStartup=true
 instanceDir=providersearch transient=false/

 core name=tridioncomponents loadOnStartup=true
 instanceDir=tridioncomponents transient=false/

 core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true
 transient=false/

 core name=linesvcgeofull instanceDir=linesvcgeofull
 loadOnStartup=true transient=false/
 /cores


 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076




-- 
- Mark

Re: Prioritize search returns by URL path?

1) i would strongly advise you against falling in the trap of thinking
things like Wiki posts should always be returned higher than blog posts
... unless you truly want *any* wiki post that matches your keywords, no
matter how tangentially and how poorly, to come back higher on the list
of results that any blog post -- evne if that blog post is 100% dedicated
to the keywords the user searched for.

if that's really want you want, then all you need is sort=doc_type desc,
score desc where you assign a numeric doct_type value at index type --
but i assure you, it's a terrible idea.

2) in general, what you are interesting in is domain boosting ... where
because of the specifics of your domain knowledge, you know that certain
documents should generally score higher -- how much higher is an art form,
that again is going to largely dependon the specifics of your domain, but
you will most likeley want it to be something you can tweak and tune.

3) regardless of the specifics of the website you are dealing with, and
the URL structure used, what really matters is how you convert the raw
data on your website into documents to be indexed -- when you do that,
however you do that, is when you can add fields to your documents to
convey information like this document is from the wiki or this document
is from the forum or this doument is a verified forum answer. If the
only way you can conceptually know this information is by parsing the URL,
then so be it -- but more then likeley if you are reading this data
directly from an authoritative source (instead of just crawling URLs),
there are easy methods to determine this stuff.

. . .

My initial suggestion would be to create a simple field called
doc_type containing values like wiki, blog, forum,
forum_verified, and forum_suggested ... with those values *indexed*
for each doc, you can then use the ExternalFileField to associate a
numeric value to each of those special values, and you can tune tweak
those numeric values w/o re-indexing. Then you should look into how boost
functions work to make those numeric values an input into the final score
calculations.

In the long run hwoever, you may want ot consider indexing a general
importance value for each doc that you re-compute periodically based not
just on the *type* of the document, but also things like the number of
page views, the number of votes for forum answers to be verified, etc...

More information about domain boosting...

https://people.apache.org/~hossman/ac2012eu/
http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55822630

On Fri, 6 Dec 2013, Jim Glynn wrote:

: Date: Fri, 6 Dec 2013 13:10:59 -0800 (PST)
: From: Jim Glynn jrgl...@hotmail.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Prioritize search returns by URL path?
:
: Thanks all. Yes, we can differentiate between content types by URL.
: Everything else being equal, Wiki posts should always be returned higher
: than blog posts, and blog posts should always be returned higher than forum
: posts.
:
: Within forum posts, we want to rank Verified answered and Suggested answered
: posts higher than unanswered posts. These cannot be identified via path -
: only via metadata attached to the individual post. Any suggestions?
:
: @Alex, I'll investigate the references you provided. Thanks!
:
:
:
: --
: View this message in context:
http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023p4105426.html
: Sent from the Solr - User mailing list archive at Nabble.com.
:

-Hoss
http://www.lucidworks.com/

Re: Use of Deprecated Classes: SortableIntField SortableFloatField SortableDoubleField

2013-12-09 Thread O. Olson

Thank you kydryavtsev andrey. Could you please suggest some examples. There
is no documentation on this. Also is there a reason why these classes are
not used in the examples even though they are deprecated?

I am looking for examples like below: Should I put the following in my
schema.xml file to use the TrieIntField:

fieldType name=sint class=solr.TrieIntField sortMissingLast=true
omitNorms=true/

Is this specification correct? Should it also have the sortMissingLast and
omitNorms, because I want something that I can use for sorting? I have no
clue how you get these.

Thank you again,
O. O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-of-Deprecated-Classes-SortableIntField-SortableFloatField-SortableDoubleField-tp4105762p4105781.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching for document by id in a sharded environment

2013-12-09 Thread Joel Bernstein

Daniel,

What version of Solr are you using? I'll see if I can recreate this.




On Mon, Dec 9, 2013 at 7:21 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Daniel,

 TermQueryParser comes handy when you don't want to escape.

 q = {!term
 f=id}156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475




 On Monday, December 9, 2013 2:14 PM, Daniel Bryant 
 daniel.bry...@tai-dev.co.uk wrote:
 Hi,

 I'm in the process of migrating an application that queries Solr to use
 a new sharded SolrCloud, and as part of this I'm adding the shard key to
 the document id when we index documents (as we're using grouping and we
 need to ensure that grouped documents end up on the same shard) e.g.

 156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475

 I'm having a problem with my application when searching by id with SolrJ
 CloudSolrServer - the exclamation point is misinterpreted as a boolean
 negation, and the matching document is not returned in the search results.

 I just wanted to check if the only way to make this work would be to
 escape the exclamation point (i.e. prefix with a slash, or enclose the
 id within quotes). We're keen to avoid this, as this will require lots
 of modifications throughout the code on a series of applications that
 interact with Solr.

 If anyone has any better suggestions on how to achieve this it would be
 very much appreciated!

 Best wishes,

 Daniel


 --
 *Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk
 http://www.tai-dev.co.uk/*
 daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk  |  +44
 (0) 7799406399  |  Twitter: @taidevcouk https://twitter.com/taidevcouk




-- 
Joel Bernstein
Search Engineer at Heliosearch

Newbie to SOLR with ridiculously simple questions

2013-12-09 Thread smetzger

OK...
Im a Windows guy who is being forced to learn SoLR on Ubuntu for the whole
organizations. I fancy myself somewhat capable of following directions but
this Solr concept is puzzling. 

Here is what I think i know. 

Solr houses indexes. Each index record (usually based on a document) need to
be added to the Solr collection.  This seems fairly simple and I can run the
post.jar and various xml and json files  FROM THE UBUNTU TERMINAL. I doubt
you have to use the Terminal every time you want to add an index.

My guess is that you have to feed Solr from third party systems using the
http: update url into the solr server. Is this correct? Lets say i have a
(god forbid) a sharepoint site and I want to move all the document text and
document metadata into Solr.  Do I simply run a script (say in .NET or
Coldfusion) that loops through the SP doc records and sends out the http
update url to Solr for each doc???

How does Tika fit in ?

thanks
steve







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newbie-to-SOLR-with-ridiculously-simple-questions-tp4105788.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Newbie to SOLR with ridiculously simple questions

2013-12-09 Thread Alexandre Rafalovitch

Hi Steve,

Good luck. I would start from doing online tutorial if you haven't already
(do it on Windows) and then reading a book. There are several on the
market, including my own for the beginners (
http://blog.outerthoughts.com/2013/06/my-book-on-solr-is-now-published/ ).

For SharePoint, I would look at http://manifoldcf.apache.org/en_US/ , they
seem to be covering that use case specifically and sending information to
Solr.

For more general case, I would look at SolrNet (
https://github.com/mausch/SolrNet/blob/master/Documentation/README.md ). To
use Solr 4 with SorlNet, you would need to get the latest build or build it
yourself from source, it is not terribly complicated.

Tika, is a separate Apache project bundled with Solr and is used to parse
binary files (e.g. PDFs, MSWord, etc) and extract whatever is possible,
usually structured metadata and some sort of internal text.

For the interface, there is a couple of options, though most people are
rolling their own. The main reason is because you should NOT expose Solr
directly to the web (not secure), so there is a need for Solr middleware.
Solr middleware is usually custom with project-specific enhancements, etc.
But you could have a look at Hue for internal/intermediate usage. Hue is
for Hadoop ecosystem, but does include Solr support too:
http://gethue.tumblr.com/tagged/search

The most important point to remember when you are understanding Solr is
that it is there for _search_. You shape your data to match that purpose.
If that breaks relationships and duplicates data in Solr, that's fine. You
still have your primary data safe in relational/document storage.

Regards,
Alex.

On Tue, Dec 10, 2013 at 6:13 AM, smetzger smetz...@msi-inc.com wrote:

OK...
Im a Windows guy who is being forced to learn SoLR on Ubuntu for the whole
organizations. I fancy myself somewhat capable of following directions but
this Solr concept is puzzling.

Here is what I think i know.

Solr houses indexes. Each index record (usually based on a document) need
to
be added to the Solr collection. This seems fairly simple and I can run
the
post.jar and various xml and json files FROM THE UBUNTU TERMINAL. I doubt
you have to use the Terminal every time you want to add an index.

My guess is that you have to feed Solr from third party systems using the
http: update url into the solr server. Is this correct? Lets say i have a
(god forbid) a sharepoint site and I want to move all the document text and
document metadata into Solr. Do I simply run a script (say in .NET or
Coldfusion) that loops through the SP doc records and sends out the http
update url to Solr for each doc???

How does Tika fit in ?

thanks
steve

--
View this message in context:
http://lucene.472066.n3.nabble.com/Newbie-to-SOLR-with-ridiculously-simple-questions-tp4105788.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Global query parameters to facet query

: It seems that a facet query does not use the global query parameters (for
: example, field aliasing for edismax parser).

can you please give a specific example of a query that isn't working for 
you?

Using this query against the examle data, things work exactly as i would 
expect showing that the QParsers used for facet.queries inherit the 
global params (unless overridden by local params of course)...

http://localhost:8983/solr/select?q=*:*wt=jsonindent=truefacet=truefacet.query={!dismax}solr+bogusfacet.query={!dismax%20mm=1}solr+bogusfacet.query={!dismax%20mm=1%20qf=%27foo_t%27}solr+bogusrows=0mm=2qf=name
{
  responseHeader:{
status:0,
QTime:2,
params:{
  mm:2,
  facet:true,
  indent:true,
  facet.query:[{!dismax}solr bogus,
{!dismax mm=1}solr bogus,
{!dismax mm=1 qf='foo_t'}solr bogus],
  q:*:*,
  qf:name,
  wt:json,
  rows:0}},
  response:{numFound:32,start:0,docs:[]
  },
  facet_counts:{
facet_queries:{
  {!dismax}solr bogus:0,
  {!dismax mm=1}solr bogus:1,
  {!dismax mm=1 qf='foo_t'}solr bogus:0},
facet_fields:{},
facet_dates:{},
facet_ranges:{}}}






-Hoss
http://www.lucidworks.com/

Re: Newbie to SOLR with ridiculously simple questions

2013-12-09 Thread smetzger

Thanks for the reply Alex...
in fact I am using your book!

the book seems like a good tutorial ...

My bitnami solr instance however already includes Solr (running in
background) and a directory structure :

root
--opt
bitnami
--apache-solr
solr
--collection1


I assume that the apache-solr directory is the same as the universal
example directory mentioned in many tutorials. If I follow your book I
create a new directory under apache-solr called SOLR-INDEXING with the
collection1/conf/ and .xml files per your instruction. 

but now i have two instances running and somehow I need to point solr from
the solr/collection1 core to the SOLR-INDEXING/collection1   core  

I would think this could be done on the Solr Admin page but can't see how.
If i try and restart the jetty with java -Dsolr.solr.home=SOLR-INDEXING
-jar start.jarit runs and does some install but I think it does not
shut down the prior one first. In fact once i run that i lose all my solr
and have to reinstall the VMWARE snapshot. 

Any guidance would be useful so I can continue with your book. 
Thanks
steve






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newbie-to-SOLR-with-ridiculously-simple-questions-tp4105788p4105812.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Newbie to SOLR with ridiculously simple questions

2013-12-09 Thread Alexandre Rafalovitch

I think you might be complicating your life with BitNami stack during
learning. I would just download latest Solr to your Windows desktop and go
through the examples there.

Still, you can try moving collection1 directory under 'solr' and putting my
examples there instead. Then, you don't need to change any scripts. Or
rename collection1 to another name and add it to solr.xml as per
instructions in the book to have it as a second core. Basically, change the
content of 'solr' directory rather than the scripts that make it work. But
then you still need need to know where the libraries are as I bet the file
path would be different from my book's instructions. Use 'locate' command
on unix to find where the jar might be.

Just make sure BitNami stack Solr is at least 4.3 (4.3.1?) as per book's
minimum requirements. Otherwise, more advanced examples will fail in
strange ways.

Regards,
Alex.

On Tue, Dec 10, 2013 at 8:22 AM, smetzger smetz...@msi-inc.com wrote:

Thanks for the reply Alex...
in fact I am using your book!

the book seems like a good tutorial ...

My bitnami solr instance however already includes Solr (running in
background) and a directory structure :

root
--opt
bitnami
--apache-solr
solr
--collection1

I assume that the apache-solr directory is the same as the universal
example directory mentioned in many tutorials. If I follow your book I
create a new directory under apache-solr called SOLR-INDEXING with the
collection1/conf/ and .xml files per your instruction.

but now i have two instances running and somehow I need to point solr from
the solr/collection1 core to the SOLR-INDEXING/collection1 core

I would think this could be done on the Solr Admin page but can't see how.
If i try and restart the jetty with java -Dsolr.solr.home=SOLR-INDEXING
-jar start.jarit runs and does some install but I think it does not
shut down the prior one first. In fact once i run that i lose all my solr
and have to reinstall the VMWARE snapshot.

Any guidance would be useful so I can continue with your book.
Thanks
steve

--
View this message in context:
http://lucene.472066.n3.nabble.com/Newbie-to-SOLR-with-ridiculously-simple-questions-tp4105788p4105812.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing on plain text and binary data in a single HTTP POST request

Thanks everybody for throwing your ideas.

So, I came to know that XML can not carry random binary data so I will
encode the data in base64 format.
Yes, I can write a custom URP which can convert the base64 encode fields to
binary fields. Now, I have binary fields in my document.* My question is
that how can I convert those binary fields to text so that Solr can index
them ? *



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661p4105826.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing on plain text and binary data in a single HTTP POST request

Hi,
Pls. find my response in-line:

That said, the obvious alternative is to use /update/extract instead of
/update – this gives you a way of handling up to one binary stream in
addition to any number of fields that can be represented as text. In that
case, you need to construct a POST request that sends the binary content as
a file stream, and the other parameters as ordinary form data (actually, it
may be possible to send some/all of the other fields as url parameters, but
that does not really simplify things). 

[Neeraj]: I thought about this solution but it won't work in my solution as
there are a lot text fields and size is also very significant. I am looking
for some other suggestion



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661p4105827.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing on plain text and binary data in a single HTTP POST request

2013-12-09 Thread Michael Sokolov


On 12/9/2013 11:13 PM, neerajp wrote:

Hi,
Pls. find my response in-line:

That said, the obvious alternative is to use /update/extract instead of
/update – this gives you a way of handling up to one binary stream in
addition to any number of fields that can be represented as text. In that
case, you need to construct a POST request that sends the binary content as
a file stream, and the other parameters as ordinary form data (actually, it
may be possible to send some/all of the other fields as url parameters, but
that does not really simplify things).

[Neeraj]: I thought about this solution but it won't work in my solution as
there are a lot text fields and size is also very significant. I am looking
for some other suggestion



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661p4105827.html
Sent from the Solr - User mailing list archive at Nabble.com.
Assuming that your binary fields are mime attachments to email messages, 
they will probably already be encoded as base 64.  Why not just leave 
them that way in solr too?  You can't do much with them other than store 
them right?  Or do you have some kind of image processing going on?  You 
can always decode them in your client when you pull them out.


-Mike

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

2013-12-09 Thread Salman Akram

We used that syntax in 1.4.1 when Surround was not part of SOLR and has to
register it. Didn't know that it is now part of SOLR. Any ways this is a
red herring since I have totally removed Surround and the issue remains
there.

Below is the debug info when I give a simple phrase query having common
words with default Query Parser. What I don't understand is that why is it
including single tokens as well? I have also included the relevant config
part below.

rawquerystring: Contents:\only be\,
querystring: Contents:\only be\,
parsedquery: MultiPhraseQuery(Contents:\(only only_be) be\),
parsedquery_toString: Contents:\(only only_be) be\,

QParser: LuceneQParser,

=

fieldtype name=text class=solr.TextField
analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.CommonGramsFilterFactory words=commonwords.txt
ignoreCase=true/
/analyzer
/fieldtype



On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher erik.hatc...@gmail.com wrote:

 But again, as Ahmet mentioned… it doesn't look like the surround query
 parser is actually being used.   The debug output also mentioned the query
 parser used, but that part wasn't provided below.  One thing to note here,
 the surround query parser is not available in 1.4.1.   It also looks like
 you're surrounding your query with angle brackets, as it says query string
 is {!surround}Contents:only be, which is not correct syntax.  And one
 of the most important things to note here is that the surround query parser
 does NOT use the analysis chain of the field, see 
 http://wiki.apache.org/solr/SurroundQueryParser#Limitations.  In short,
 you're going to have to do some work to get common grams factored into a
 surround query (such as maybe calling to the analysis request hander to
 parse the query before sending it to the surround query parser).

 Erik


 On Dec 9, 2013, at 9:36 AM, Salman Akram 
 salman.ak...@northbaysolutions.net wrote:

  Yup on debugging I found that its coming in Analyzer. We are using
 Standard
  Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if
 its
  a bug or I am missing some config.
 
 
  On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi Salman,
  I am confused because with surround no analysis is applied at query
 time.
  I suspect that surround query parser is not kicking in. You should see
  SrndQuery or something like at parser query section.
 
 
 
  On Monday, December 9, 2013 6:24 AM, Salman Akram 
  salman.ak...@northbaysolutions.net wrote:
 
  All,
 
  I posted this sub-issue with another issue few days back but maybe it
 was
  not obvious so posting it on a separate thread.
 
  We recently migrated to SOLR 4.6. We use Common Grams but queries with
  words in the CG list have slowed down. On debugging we found that for CG
  words the parser is adding individual tokens of those words in the query
  too which ends up slowing it. Below is an example:
 
  Query = only be
 
  Here is what debug shows. I have highlighted the red part which is
  different in both versions i.e. SOLR 4.6 is making it a multiphrasequery
  and adding individual tokens too. Can someone help?
 
  SOLR 4.6 (takes 20 secs)
  str name=rawquerystring{!surround}Contents:only be/str
  str name=querystring{!surround}Contents:only be/str
  str name=parsedqueryMultiPhraseQuery(Contents:(only only_be)
  be)/str
  str name=parsedquery_toStringContents:(only only_be) be/str
 
  SOLR 1.4.1 (takes 1 sec)
  str name=rawquerystring{!surround}Contents:only be/str
  str name=querystring{!surround}Contents:only be/str
  str name=parsedqueryContents:only_be/str
  str name=parsedquery_toStringContents:only_be/str--
 
 
  Regards,
 
  Salman Akram
 
 
 
 
  --
  Regards,
 
  Salman Akram




-- 
Regards,

Salman Akram

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

Hi Salman,

I never used commons gram filer but I remember there are two classes in this
family. CommonGramsFilter and CommonGramsQueryFilter. It seems that
CommonsGramsQueryFilter is what you are after.

http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.html

http://khaidoan.wikidot.com/solr-common-gram-filter

On Tuesday, December 10, 2013 6:43 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
We used that syntax in 1.4.1 when Surround was not part of SOLR and has to
register it. Didn't know that it is now part of SOLR. Any ways this is a
red herring since I have totally removed Surround and the issue remains
there.

Below is the debug info when I give a simple phrase query having common
words with default Query Parser. What I don't understand is that why is it
including single tokens as well? I have also included the relevant config
part below.

rawquerystring: Contents:\only be\,
querystring: Contents:\only be\,
parsedquery: MultiPhraseQuery(Contents:\(only only_be) be\),
parsedquery_toString: Contents:\(only only_be) be\,

QParser: LuceneQParser,

fieldtype name=text class=solr.TextField
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.CommonGramsFilterFactory words=commonwords.txt
ignoreCase=true/
/analyzer
/fieldtype

On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher erik.hatc...@gmail.com wrote:

But again, as Ahmet mentioned… it doesn't look like the surround query
parser is actually being used. The debug output also mentioned the query
parser used, but that part wasn't provided below. One thing to note here,
the surround query parser is not available in 1.4.1. It also looks like
you're surrounding your query with angle brackets, as it says query string
is {!surround}Contents:only be, which is not correct syntax. And one
of the most important things to note here is that the surround query parser
does NOT use the analysis chain of the field, see
http://wiki.apache.org/solr/SurroundQueryParser#Limitations. In short,
you're going to have to do some work to get common grams factored into a
surround query (such as maybe calling to the analysis request hander to
parse the query before sending it to the surround query parser).

Erik

On Dec 9, 2013, at 9:36 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Yup on debugging I found that its coming in Analyzer. We are using
Standard
Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if
its
a bug or I am missing some config.

On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote:

Hi Salman,
I am confused because with surround no analysis is applied at query
time.
I suspect that surround query parser is not kicking in. You should see
SrndQuery or something like at parser query section.

On Monday, December 9, 2013 6:24 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

All,

I posted this sub-issue with another issue few days back but maybe it
was
not obvious so posting it on a separate thread.

We recently migrated to SOLR 4.6. We use Common Grams but queries with
words in the CG list have slowed down. On debugging we found that for CG
words the parser is adding individual tokens of those words in the query
too which ends up slowing it. Below is an example:

Query = only be

Here is what debug shows. I have highlighted the red part which is
different in both versions i.e. SOLR 4.6 is making it a multiphrasequery
and adding individual tokens too. Can someone help?

SOLR 4.6 (takes 20 secs)
str name=rawquerystring{!surround}Contents:only be/str
str name=querystring{!surround}Contents:only be/str
str name=parsedqueryMultiPhraseQuery(Contents:(only only_be)
be)/str
str name=parsedquery_toStringContents:(only only_be) be/str

SOLR 1.4.1 (takes 1 sec)
str name=rawquerystring{!surround}Contents:only be/str
str name=querystring{!surround}Contents:only be/str
str name=parsedqueryContents:only_be/str
str name=parsedquery_toStringContents:only_be/str--

Regards,

Salman Akram

--
Regards,

Salman Akram

--
Regards,

Salman Akram

Re: Getting Solr Document Attributes from a Custom Function

2013-12-09 Thread Mukundaraman valakumaresan

Hi Hoss,

Thanks a lot for your response. The actual problem is,

For every record that I query, I have to execute a formula and sort the
records based on the value of the formula.
The formula has elements from the record.

For eg. for the following document ,I need to apply the formula (maxprice -
solrprice)/ (maxprice - minprice)  +  count(cities)/totalcities.
where maxprice, maxprice and total cities will be available at run time.

So for the following record, it has to execute as  (1 -
*5000*)/(1-2000)
+ *2*/5   (where 5000 and 2, which are in bold are from the document)

doc
field name=idapartment_1/field
field name=nameCasa Grande/field
field name=localitychennai/field
field name=localitybangalore/field
field name=price5000/field
/doc

Thanks  Regards
Mukund



On Tue, Dec 10, 2013 at 12:22 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 Smells like an XY problem ...

 Can you please describe what your end goal is in writing a custom
 function, and what you would do with things like the name field inside
 your funciton?

 In general, accessing stored field values for indexed documents ca be
 prohibitively expensive, it rather defeats the entire point of the
 inverted index data structure.  If you help us understand what your goal
 is, people may be able to offer performant suggestions.



 https://people.apache.org/~hossman/#xyproblem
 XY Problem

 Your question appears to be an XY Problem ... that is: you are dealing
 with X, you are assuming Y will help you, and you are asking about Y
 without giving more details about the X so that we can understand the
 full issue.  Perhaps the best solution doesn't involve Y at all?
 See Also: http://www.perlmonks.org/index.pl?node_id=542341




 : Date: Mon, 9 Dec 2013 20:24:15 +0530
 : From: Mukundaraman valakumaresan muk...@8kmiles.com
 : Reply-To: solr-user@lucene.apache.org
 : To: solr-user@lucene.apache.org
 : Subject: Getting Solr Document Attributes from a Custom Function
 :
 : Hi All,
 :
 : I have a written a custom solr function and I would like to read a
 property
 : of the document inside my custom function. Is it possible to get that
 using
 : Solr?
 :
 : For eg. inside the floatVal method, I would like to get the value of the
 : attribute name
 :
 : public class CustomValueSource extends ValueSource {
 :
 : @Override
 : public FunctionValues getValues(Map context,
 : AtomicReaderContext readerContext) throws IOException {
 :  return new FloatDocValues(this) { @Override public float floatVal(int
 doc)
 : {
 : /***
 :  getDocument(doc).getAttribute(name)
 :
 : / }}}
 :
 : Thanks  Regards
 : Mukund
 :

 -Hoss
 http://www.lucidworks.com/

Re: Use of Deprecated Classes: SortableIntField SortableFloatField SortableDoubleField

2013-12-09 Thread Kydryavtsev Andrey

Could you please suggest some examples. There
is no documentation on this.

You can find examples with this field types in solr codebase (like this
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/collection1/conf/schema.xml?view=cocontent-type=text%2Fplain
)
You can find more details about solr field types here
http://wiki.apache.org/solr/SchemaXml or here
https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr

fieldType name=sint class=solr.TrieIntField sortMissingLast=true
omitNorms=true/

Yes, it seems like correct specification.

Should it also have the sortMissingLast and
omitNorms, because I want something that I can use for sorting?

Only name and class parameters are mandatory. But optional parameters can
also be useful for your field type. You can find what they actually mean here
https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties

10.12.2013, 02:19, O. Olson olson_...@yahoo.it:
Thank you kydryavtsev andrey. Could you please suggest some examples. There
is no documentation on this. Also is there a reason why these classes are
not used in the examples even though they are deprecated?

I am looking for examples like below: Should I put the following in my
schema.xml file to use the TrieIntField:

fieldType name=sint class=solr.TrieIntField sortMissingLast=true
omitNorms=true/

Is this specification correct? Should it also have the sortMissingLast and
omitNorms, because I want something that I can use for sorting? I have no
clue how you get these.

Thank you again,
O. O.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Use-of-Deprecated-Classes-SortableIntField-SortableFloatField-SortableDoubleField-tp4105762p4105781.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr standard score

2013-12-09 Thread Prasi S

Hi,
I have a requirement to standardize solr scores. For example,


docs with score  7   Most relevant
docs with score 7 and 4  Moderate
docs with score 4  Less relevant.

But in the real scenario this does not happen, as in few scenarios the top
document may have a score of 3.5.

Can i have the scores standardized in someway ( by index/query boosting) so
that i can achieve this.



Thanks,
Prasi

Re: Difference between textfield and strfield

2013-12-09 Thread manju16832003

Hey Iori,
Apologize for misunderstanding :-). 

Yes agree with you, faceting will be OK with TextField type however I'm
concern about performance impact while running the facets if we have
millions of documents.

I wish in future we could apply tokensizers and filters to String fields.
:-).Thanks for your inputs.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Difference-between-textfield-and-strfield-tp3986916p4105841.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Getting Solr Document Attributes from a Custom Function

2013-12-09 Thread Kydryavtsev Andrey

You can implement it in this way:
Index number of cities as new int field (like field 
name=numberOfCities2/field) and implement user function like

customFunction(price, numberOfCities, 1, 2000, 5)

Custom parser should parse this into value sources list. From first two field 
sources we can get per doc value for this particular fields, another three will 
be ConstValueSource instances - just constants, so we can access all 5 values 
and implement custom formula per doc id. Find examples in ValueSourceParser and 
solr functions like DefFunction or MinFloatFunction

10.12.2013, 09:31, Mukundaraman valakumaresan muk...@8kmiles.com:
 Hi Hoss,

 Thanks a lot for your response. The actual problem is,

 For every record that I query, I have to execute a formula and sort the
 records based on the value of the formula.
 The formula has elements from the record.

 For eg. for the following document ,I need to apply the formula (maxprice -
 solrprice)/ (maxprice - minprice)  +  count(cities)/totalcities.
 where maxprice, maxprice and total cities will be available at run time.

 So for the following record, it has to execute as  (1 -
 *5000*)/(1-2000)
 + *2*/5   (where 5000 and 2, which are in bold are from the document)

 doc
 field name=idapartment_1/field
 field name=nameCasa Grande/field
 field name=localitychennai/field
 field name=localitybangalore/field
 field name=price5000/field
 /doc

 Thanks  Regards
 Mukund

 On Tue, Dec 10, 2013 at 12:22 AM, Chris Hostetter
 hossman_luc...@fucit.orgwrote:

  Smells like an XY problem ...

  Can you please describe what your end goal is in writing a custom
  function, and what you would do with things like the name field inside
  your funciton?

  In general, accessing stored field values for indexed documents ca be
  prohibitively expensive, it rather defeats the entire point of the
  inverted index data structure.  If you help us understand what your goal
  is, people may be able to offer performant suggestions.

  https://people.apache.org/~hossman/#xyproblem
  XY Problem

  Your question appears to be an XY Problem ... that is: you are dealing
  with X, you are assuming Y will help you, and you are asking about Y
  without giving more details about the X so that we can understand the
  full issue.  Perhaps the best solution doesn't involve Y at all?
  See Also: http://www.perlmonks.org/index.pl?node_id=542341

  : Date: Mon, 9 Dec 2013 20:24:15 +0530
  : From: Mukundaraman valakumaresan muk...@8kmiles.com
  : Reply-To: solr-user@lucene.apache.org
  : To: solr-user@lucene.apache.org
  : Subject: Getting Solr Document Attributes from a Custom Function
  :
  : Hi All,
  :
  : I have a written a custom solr function and I would like to read a
  property
  : of the document inside my custom function. Is it possible to get that
  using
  : Solr?
  :
  : For eg. inside the floatVal method, I would like to get the value of the
  : attribute name
  :
  : public class CustomValueSource extends ValueSource {
  :
  : @Override
  : public FunctionValues getValues(Map context,
  : AtomicReaderContext readerContext) throws IOException {
  :  return new FloatDocValues(this) { @Override public float floatVal(int
  doc)
  : {
  : /***
  :  getDocument(doc).getAttribute(name)
  :
  : / }}}
  :
  : Thanks  Regards
  : Mukund
  :

  -Hoss
  http://www.lucidworks.com/

Re: Solr standard score

2013-12-09 Thread Walter Underwood

The scores cannot be normalized that way. You can try, but it just isn't going 
to work the way you expect. Tell the people who wrote this requirement that it 
isn't possible.

http://wiki.apache.org/lucene-java/ScoresAsPercentages

wunder

On Dec 9, 2013, at 10:21 PM, Prasi S prasi1...@gmail.com wrote:

 Hi,
 I have a requirement to standardize solr scores. For example,
 
 
 docs with score  7   Most relevant
 docs with score 7 and 4  Moderate
 docs with score 4  Less relevant.
 
 But in the real scenario this does not happen, as in few scenarios the top
 document may have a score of 3.5.
 
 Can i have the scores standardized in someway ( by index/query boosting) so
 that i can achieve this.
 
 
 
 Thanks,
 Prasi

Re: solr.xml

2013-12-09 Thread William Bell

Thanks Mark.

https://issues.apache.org/jira/browse/SOLR-5543


On Mon, Dec 9, 2013 at 2:39 PM, Mark Miller markrmil...@gmail.com wrote:

 Sounds like a bug. If you are seeing this happen in 4.6, I'd file a JIRA
 issue.

 - Mark


 On Sun, Dec 8, 2013 at 3:49 PM, William Bell billnb...@gmail.com wrote:

  Any thoughts? Why are we getting duplicate items in solr.xml ?
 
  -- Forwarded message --
  From: William Bell billnb...@gmail.com
  Date: Sat, Dec 7, 2013 at 1:48 PM
  Subject: solr.xml
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 
 
  We are having issues with SWAP CoreAdmin in 4.5.1 and 4.6.
 
  Using legacy solr.xml we issue a SWAP, and we want it persistent. It has
  bee running flawless since 4.5. Now it creates duplicate lines in
 solr.xml.
 
  Even the example multi core schema in 4.5.1 doesn't work with
  persistent=true - it creates duplicate lines in solr.xml.
 
   cores adminPath=/admin/cores
  core name=autosuggest loadOnStartup=true
 instanceDir=autosuggest
  transient=false/
 
  core name=citystateprovider loadOnStartup=true
  instanceDir=citystateprovider transient=false/
 
  core name=collection1 loadOnStartup=true
 instanceDir=collection1
  transient=false/
 
  core name=facility loadOnStartup=true instanceDir=facility
  transient=false/
 
  core name=inactiveproviders loadOnStartup=true
  instanceDir=inactiveproviders transient=false/
 
  core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true
  transient=false/
 
  core name=linesvcgeofull instanceDir=linesvcgeofull
  loadOnStartup=true transient=false/
 
  core name=locationgeo loadOnStartup=true
 instanceDir=locationgeo
  transient=false/
 
  core name=market loadOnStartup=true instanceDir=market
  transient=false/
 
  core name=portalprovider loadOnStartup=true
  instanceDir=portalprovider transient=false/
 
  core name=practice loadOnStartup=true instanceDir=practice
  transient=false/
 
  core name=provider loadOnStartup=true instanceDir=provider
  transient=false/
 
  core name=providersearch loadOnStartup=true
  instanceDir=providersearch transient=false/
 
  core name=tridioncomponents loadOnStartup=true
  instanceDir=tridioncomponents transient=false/
 
  core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true
  transient=false/
 
  core name=linesvcgeofull instanceDir=linesvcgeofull
  loadOnStartup=true transient=false/
  /cores
 
 
  --
  Bill Bell
  billnb...@gmail.com
  cell 720-256-8076
 
 
 
  --
  Bill Bell
  billnb...@gmail.com
  cell 720-256-8076
 



 --
 - Mark




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: Indexing on plain text and binary data in a single HTTP POST request