Solr Sorting is not working properly on long Fields

2013-03-24 Thread ballusethuraman
Hi,  I am having a column named 'Kilometers' and when I try to sort with
that it is not working properly.The values in 'Kilometers' column
are,Kilometers171119792365611Values in 'Kilometers' after soting are
Kilometers979236561117111The Problem here is, when 97 is compared with 923
it is taking 97 as bigger number since 97 is greater than 923. Initially
Kilometers column was having string as datatype and I thought the problem
could be because of that and i changed the datatype of that column to
'long'. Even then i couldn't see any change in the results.But when I insert
values which are having same number of digits say, 1, 2,
3,4,5Kilometers21452  when i try to sort now
it is working perfectlyKilometers12345Datatypes that I
have tries are, Can anyone helpme to get rid out of this problem...
Thanks in Advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Sorting-is-not-working-properly-on-long-Fields-tp4050833.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr sorting is not working properly on long Fields

2013-03-24 Thread ballusethuraman
Hi, I am having a column named 'Kilometers' and when I try to sort with that
it is not working properly. 


The values in 'Kilometers' column are, 


Kilometers 
17 
111 
97 
923 
65 
611 

Values in 'Kilometers' after soting are 

Kilometers 
97
923 
65 
611 
17 
111 


The Problem here is, when 97 is compared with 923 it is taking 97 as bigger
number since 97 is greater than 923. 


Initially Kilometers column was having string as datatype and I thought the
problem could be because of that and i changed the datatype of that column
to 'long'. Even then i couldn't see any change in the results.


But when I insert values which are having same number of digits say, 1,
2, 3,4,5 

Kilometers 
2 
1 
4 
5 
2 
when i try to sort now it is working perfectly 
Kilometers 
1 
2 
3 
4 
5 
Datatypes that I have tried are, 



   field name=adi_f10001 type=wc_keywordText indexed=true
stored=true multiValued=false/



   field name=adi_f10001 type=long indexed=true stored=true
multiValued=false/




   field name=adi_f10001 type=double indexed=true stored=true
multiValued=false/



Can anyone helpme to get rid out of this problem... Thanks in Advance 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-sorting-is-not-working-properly-on-long-Fields-tp4050834.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr sorting is not working properly on long Fields

2013-03-24 Thread Gora Mohanty
On 24 March 2013 11:56, ballusethuraman ballusethura...@gmail.com wrote:
 Hi, I am having a column named 'Kilometers' and when I try to sort with that
 it is not working properly.
[...]
 Initially Kilometers column was having string as datatype and I thought the
 problem could be because of that and i changed the datatype of that column
 to 'long'. Even then i couldn't see any change in the results.
[...]

Did you reindex after changing the data type of the column to
long?

Regards,
Gora


Solr using a ridiculous amount of memory

2013-03-24 Thread John Nielsen
Hello all,

We are running a solr cluster which is now running solr-4.2.

The index is about 35GB on disk with each register between 15k and 30k.
(This is simply the size of a full xml reply of one register. I'm not sure
how to measure it otherwise.)

Our memory requirements are running amok. We have less than a quarter of
our customers running now and even though we have allocated 25GB to the JVM
already, we are still seeing daily OOM crashes. We used to just allocate
more memory to the JVM, but with the way solr is scaling, we would need
well over 100GB of memory on each node to finish the project, and thats
just not going to happen. I need to lower the memory requirements somehow.

I can see from the memory dumps we've done that the field cache is by far
the biggest sinner. Of special interest to me is the recent introduction of
DocValues which supposedly mitigates this issue by using memory outside the
JVM. I just can't, because of lack of documentation, seem to make it work.

We do a lot of facetting. One client facets on about 50.000 docs of approx
30k each on 5 fields. I understand that this is VERY memory intensive.

Schema with DocValues attempt at solving problem:
http://pastebin.com/Ne23NnW4
Config: http://pastebin.com/x1qykyXW

The cache is pretty well tuned. Any lower and i get evictions.

Come hell or high water, my JVM memory requirements must come down. Simply
moving some memory load outside of the JVM would be awesome! Making it not
use the field cache for anything would also (probably) work for me. I
thought about killing off my other caches, but from the dumps, they just
don't seem to use that much memory.

I am at my wits end. Any help would be sorely appreciated.

-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr sorting is not working properly on long Fields

2013-03-24 Thread ballusethuraman
Yes I did.. But there is no change in result..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-sorting-is-not-working-properly-on-long-Fields-tp4050834p4050844.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR 4.2 SolrQuery exception

2013-03-24 Thread Sandeep Kumar Anumalla
I am using the below code and getting the exception while using SolrQuery



Mar 24, 2013 3:08:07 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@795e0c2b 
main{StandardDirectoryReader(segments_49:524 _4v(4.2):C299313 
_4x(4.2):C2953/1396 _4y(4.2):C2866/1470 _4z(4.2):C4263/2793 _50(4.2):C3554/761 
_51(4.2):C1126/365 _52(4.2):C650/285 _53(4.2):C500/215 _54(4.2):C1808/1593 
_55(4.2):C1593)}
Mar 24, 2013 3:08:07 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at 
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64)
at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1586)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

Mar 24, 2013 3:08:07 PM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=null path=null 
params={event=firstSearcherq=static+firstSearcher+warming+in+solrconfig.xmldistrib=false}
 status=500 QTime=4
Mar 24, 2013 3:08:07 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Mar 24, 2013 3:08:07 PM 
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener 
newSearcher
INFO: Loading spell index for spellchecker: default
Mar 24, 2013 3:08:07 PM 
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener 
newSearcher
INFO: Loading spell index for spellchecker: wordbreak
Mar 24, 2013 3:08:07 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [collection1] Registered new searcher Searcher@795e0c2b 
main{StandardDirectoryReader(segments_49:524 _4v(4.2):C299313 
_4x(4.2):C2953/1396 _4y(4.2):C2866/1470 _4z(4.2):C4263/2793 _50(4.2):C3554/761 
_51(4.2):C1126/365 _52(4.2):C650/285 _53(4.2):C500/215 _54(4.2):C1808/1593 
_55(4.2):C1593)}
Mar 24, 2013 3:08:07 PM org.apache.solr.core.CoreContainer registerCore
INFO: registering core: collection1
server value 
-org.apache.solr.client.solrj.embedded.EmbeddedSolrServer@3a32ea4
query value -q=smstext%3AEMIRATESrows=50
Mar 24, 2013 3:08:07 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at SolrQueryResult.solrQuery(SolrQueryResult.java:31)
at SolrQueryResult.main(SolrQueryResult.java:65)

Mar 24, 2013 3:08:07 PM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=null path=/select 
params={q=smstext%3AEMIRATESrows=50} status=500 QTime=0
org.apache.solr.client.solrj.SolrServerException: 
org.apache.solr.client.solrj.SolrServerException: java.lang.NullPointerException
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at SolrQueryResult.solrQuery(SolrQueryResult.java:31)
at SolrQueryResult.main(SolrQueryResult.java:65)
Caused by: org.apache.solr.client.solrj.SolrServerException: 
java.lang.NullPointerException
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:155)
... 4 more
Caused by: java.lang.NullPointerException
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150)
... 4 more


try{
String SOLR_HOME = /data/solr1/example/solr/;
CoreContainer coreContainer = new CoreContainer(SOLR_HOME);
CoreDescriptor discriptor = new CoreDescriptor(coreContainer, 

Tlog File not removed after hard commit

2013-03-24 Thread Niran Fajemisin
Hi all,

We import about 1.5 million documents on a nightly basis using DIH. During this 
time, we need to ensure that all documents make it into index otherwise 
rollback on any errors; which DIH takes care of for us. We also disable 
autoCommit in DIH but instruct it to commit at the very end of the import. This 
is all done through configuration of the DIH config XML file and the command 
issued to the request handler.

We have noticed that the tlog file appears to linger around even after DIH has 
issued the hard commit. My expectation would be that after the hard commit has 
occurred, the tlog file will be removed. I'm obviously misunderstanding how 
this all works.

Can someone please help me understand how this is meant to function? Thanks!

-Niran

Re: Solr Sorting is not working properly on long Fields

2013-03-24 Thread SUJIT PAL
Hi ballusethuraman, 

I am sure you have done this already, but just to be sure, did you reindex your 
existing kilometer data after you changed the data type from string to long? If 
not, then you should.

-sujit

On Mar 23, 2013, at 11:21 PM, ballusethuraman wrote:

 Hi,  I am having a column named 'Kilometers' and when I try to sort with
 that it is not working properly.The values in 'Kilometers' column
 are,Kilometers171119792365611Values in 'Kilometers' after soting are
 Kilometers979236561117111The Problem here is, when 97 is compared with 923
 it is taking 97 as bigger number since 97 is greater than 923. Initially
 Kilometers column was having string as datatype and I thought the problem
 could be because of that and i changed the datatype of that column to
 'long'. Even then i couldn't see any change in the results.But when I insert
 values which are having same number of digits say, 1, 2,
 3,4,5Kilometers21452  when i try to sort now
 it is working perfectlyKilometers12345Datatypes that I
 have tries are, Can anyone helpme to get rid out of this problem...
 Thanks in Advance
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Sorting-is-not-working-properly-on-long-Fields-tp4050833.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Practicality of enormous fields

2013-03-24 Thread Erick Erickson
Yeah, it is kind of weird, but certainly do-able. But big gotcha is if you
want to _retrieve_ that field, that could take some time. If you just want
to search it, no problems that I know of. If you do want to retrieve it,
make sure lazy field loading is enabled and that you do NOT ask for this
field in results except when you really need it...

Best
Erick


On Tue, Mar 19, 2013 at 6:33 PM, jimtronic jimtro...@gmail.com wrote:

 What are the likely ramifications of having a stored field with millions of
 words?

 For example, If I had an article and wanted to store the user id of every
 user who has read it and stuck it into a simple white space delimited
 field.
 What would go wrong and when?

 My tests lead me to believe this is not a problem, but it feels weird.

 Jim



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Practicality-of-enormous-fields-tp4049131.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Too many fields to Sort in Solr

2013-03-24 Thread Erick Erickson
Seems like a reasonable thing to do. Examine the debug output to insure
that there's no short-circuiting being done as far as ConstantScoreQuery...

Best
Erick


On Tue, Mar 19, 2013 at 7:05 PM, adityab aditya_ba...@yahoo.com wrote:

 Hi All,

 I want to validate my approach by the experts, just to make sure i am on
 doing anything wrong.

 #Docs in Solr : 25M
 Solr Versin : 4.2

 Our requirement is to list top download document based on user country.
 So we have a dynamic field *numdownload.** which is evaluate as
 *numdownloads.countryId*

 Now as sorting is an expensive and also uses large amount of java heap, I
 planned to use this field in boosting result.

 Old Query
 q=*:*fq=countryId:1sort=numdownloads.1 desc

 which i changed to
  q={!boost b=numdownloads.1}*:*fq=countryId:1

 Is my approach correct. Any better alternate ?

 thanks
 Aditya



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr using a ridiculous amount of memory

2013-03-24 Thread Jack Krupansky
Just to get started, do you hit OOM quickly with a few expensive queries, or 
is it after a number of hours and lots of queries?


Does Java heap usage seem to be growing linearly as queries come in, or are 
there big spikes?


How complex/rich are your queries (e.g., how many terms, wildcards, faceted 
fields, sorting, etc.)?


As a baseline experiment, start a Solr server, see how much Java heap is 
used/available. Then do a couple of typical queries, and check the heap size 
again. Then do a couple more similar but different (to avoid query cache 
matches), and check the heap again. Maybe do that a few times to get a 
handle on the baseline memory required and whether there might be a leak of 
some sort. Do enough queries to hits all of the fields, facets, sorting, 
etc. that are likely to be encountered in one of your typical days that hits 
OOM - just not the volume of queries. The goal is to determine if there is 
something inherently memory intensive in your index/queries, or something 
relating to a leak based on total query volume.


-- Jack Krupansky

-Original Message- 
From: John Nielsen

Sent: Sunday, March 24, 2013 4:19 AM
To: solr-user@lucene.apache.org
Subject: Solr using a ridiculous amount of memory

Hello all,

We are running a solr cluster which is now running solr-4.2.

The index is about 35GB on disk with each register between 15k and 30k.
(This is simply the size of a full xml reply of one register. I'm not sure
how to measure it otherwise.)

Our memory requirements are running amok. We have less than a quarter of
our customers running now and even though we have allocated 25GB to the JVM
already, we are still seeing daily OOM crashes. We used to just allocate
more memory to the JVM, but with the way solr is scaling, we would need
well over 100GB of memory on each node to finish the project, and thats
just not going to happen. I need to lower the memory requirements somehow.

I can see from the memory dumps we've done that the field cache is by far
the biggest sinner. Of special interest to me is the recent introduction of
DocValues which supposedly mitigates this issue by using memory outside the
JVM. I just can't, because of lack of documentation, seem to make it work.

We do a lot of facetting. One client facets on about 50.000 docs of approx
30k each on 5 fields. I understand that this is VERY memory intensive.

Schema with DocValues attempt at solving problem:
http://pastebin.com/Ne23NnW4
Config: http://pastebin.com/x1qykyXW

The cache is pretty well tuned. Any lower and i get evictions.

Come hell or high water, my JVM memory requirements must come down. Simply
moving some memory load outside of the JVM would be awesome! Making it not
use the field cache for anything would also (probably) work for me. I
thought about killing off my other caches, but from the dumps, they just
don't seem to use that much memory.

I am at my wits end. Any help would be sorely appreciated.

--
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk 



Re: Solr using a ridiculous amount of memory

2013-03-24 Thread Robert Muir
On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen j...@mcb.dk wrote:

 Schema with DocValues attempt at solving problem:
 http://pastebin.com/Ne23NnW4
 Config: http://pastebin.com/x1qykyXW


This schema isn't using docvalues, due to a typo in your config.
it should not be DocValues=true but docValues=true.

Are you not getting an error? Solr needs to throw exception if you
provide invalid attributes to the field. Nothing is more frustrating
than having a typo or something in your configuration and solr just
ignores this, reports no error, and doesnt work the way you want.
I'll look into this (I already intend to add these checks to analysis
factories for the same reason).

Separately, if you really want the terms data and so on to remain on
disk, it is not enough to just enable docvalues for the field. The
default implementation uses the heap. So if you want that, you need to
set docValuesFormat=Disk on the fieldtype. This will keep the
majority of the data on disk, and only some key datastructures in heap
memory. This might have significant performance impact depending upon
what you are doing so you need to test that.


SOLR4/lucene and JVM memory management

2013-03-24 Thread Spyros Lambrinidis
Hi,

Does anyone know how solr4/lucene and the JVM, manages memory?

We have the following case.

We have a 15GB server running only SOLR4/Lucene and the JVM (no custom code)

We had allocated 2GB of memory and the JVM was using 1.9MB. At some point
something happened and we run out of memory.

Then we increased the JVM memory to 4GB and we see that gradually, JVM
starts to use as much as it can. It is now using 3GB out of the 4GB
allocated.

Is that normal JVM memory usage? i.e. Does the JVM always use as much as it
can from the allocated space?

Thanks for your help


-- 
Spyros Lambrinidis
Head of Engineering  Commando of
PeoplePerHour.comhttp://www.peopleperhour.com
Evmolpidon 23
118 54, Gkazi
Athens, Greece
Tel: +30 210 3455480

Follow us on Facebook http://www.facebook.com/peopleperhour
Follow us on Twitter http://twitter.com/#%21/peopleperhour


RE: SOLR4/lucene and JVM memory management

2013-03-24 Thread Toke Eskildsen
Spyros Lambrinidis [spy...@peopleperhour.com]:
 Then we increased the JVM memory to 4GB and we see that gradually, JVM
 starts to use as much as it can. It is now using 3GB out of the 4GB
 allocated.

That is to be expected. When the amount of garbage collections increases, the 
JVM might decide that it would be better overall to increase the size of the 
heap. Whether it will allocate up to your 4GB limit depends on how active it 
is. If you stress it, it will probably take the last GB. 

 i.e. Does the JVM always use as much as it can from the allocated space?

No, but the Oracle JVM do tend to be somewhat greedy (very subjective, I know). 
Since larger heaps means (hopefully infrequent) pauses for full garbage 
collection with a standard setup, the consensus seems to be that it is best 
to allocate conservatively and thereby avoid over-allocation. If 2GB worked 
well for you until you hit OOM, changing to 3GB seems like a better choice than 
4GB to me. Especially since you describe the allocation up to 3GB as gradual, 
which tells me that your installation is not starved with 3GB.

- Toke Eskildsen

RE: Solr using a ridiculous amount of memory

2013-03-24 Thread Toke Eskildsen
From: John Nielsen [j...@mcb.dk]:
 The index is about 35GB on disk with each register between 15k and 30k.
 (This is simply the size of a full xml reply of one register. I'm not sure
 how to measure it otherwise.)

 Our memory requirements are running amok. We have less than a quarter of
 our customers running now and even though we have allocated 25GB to the JVM
 already, we are still seeing daily OOM crashes.

That does sound a bit peculiar. I do not understand what you mean by register 
though. How many documents does your index holds?

 I can see from the memory dumps we've done that the field cache is by far
 the biggest sinner.

Do you sort on a lot of different fields?

 We do a lot of facetting. One client facets on about 50.000 docs of approx
 30k each on 5 fields. I understand that this is VERY memory intensive.

To get a rough approximation of memory usage, we need the total number of 
documents, the average number of values for each of the 5 fields for a document 
and the number of unique values in each of the 5 fields. The rule of thumb I 
use for lower ceiling is

#documents*log2(#references) + #references*log2(#unique_values) bit

If your whole index has 10M documents, which each has 100 values for each 
field, with each field having 50M unique values, then the memory requirement 
would be more than 10M*log2(100*10M) + 100*10M*log2(50M) bit ~= 340MB/field ~= 
1.6GB for faceting on all fields. Even when we multiply that with 4 to get a 
more real-world memory requirement, it is far from the 25GB that you are 
allocating. Either you have an interestingly high number somewhere in the 
equation or something's off.

Regards,
Toke Eskildsen

Re: Recommendation for integration test framework

2013-03-24 Thread Furkan KAMACI
Unrelated about your question you said that: We are utilizing Apache Maven
as build management tool I think currently ant + ivy is build and
dependency management tools, maven pom is generated  via plugin (If I am
wrong you can correct it). Are there any plan to move the project based on
Maven?

2013/3/25 Jan Morlock jan.morl...@googlemail.com

 Hi,

 our solr implementation consists of several cores sometimes interacting
 with
 each other. Using SolrTestCaseJ4 didn't work out for us. Instead we would
 like to test the resulting war from outside using integration tests. We are
 utilizing Apache Maven as build management tool. Therefore we are currently
 thinking about using the maven failsafe plugin.
 Does anybody have experiences with using it in combination with solr? Or
 does somebody have a better recommendation for us?

 Thank you very much in advance
 Jan



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Recommendation-for-integration-test-framework-tp4050936.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr using a ridiculous amount of memory

2013-03-24 Thread Toke Eskildsen
Toke Eskildsen [t...@statsbiblioteket.dk]:
 If your whole index has 10M documents, which each has 100 values
 for each field, with each field having 50M unique values, then the 
 memory requirement would be more than 
 10M*log2(100*10M) + 100*10M*log2(50M) bit ~= 340MB/field ~=
 1.6GB for faceting on all fields.

Whoops. Missed a 0 when calculating. The case above would actually take more 
than 15GB, probably also more than the 25GB you have allocated.


Anyway, I see now in your solrconfig that your main facet fields are cat, 
manu_exact, content_type and author_s, with the 5th being maybe price, 
popularity or manufacturedate_dt?

cat seems like category (relatively few references, few uniques), content_type 
probably has a single value/item and again few uniques. No memory problem 
there, unless you have a lot of documents (100M-range). That leaves manu_exact 
and author_s. If those are freetext fields with item descriptions or similar, 
that might explain the OOM.

Could you describe the facet fields in more detail and provide us with the 
total document count?


Quick sanity check: If you are using a Linux server, could you please verify 
that your virtual memory is set to unlimited with 'ulimit -v'?

Regards,
Toke Eskildsen


Re: Solr using a ridiculous amount of memory

2013-03-24 Thread Jack Krupansky
A step I meant to include was that after you warm Solr with a 
representative collection of queries that references all of the fields, 
facets, sorting, etc. that your daily load will reference, check the Java 
heap size at that point, and then set your Java heap limit to a moderate 
level higher, like 256M, restart, and then see what happens.


The theory is that if you have too much available heap, Java will gradually 
fill it all with garbage (no leaks implied, but maybe some leaks as well), 
and then a Java GC will be an expensive hit, and sometimes a rapid flow of 
incoming requests at that point can cause Java to freak out and even hit OOM 
even though a more graceful garbage collection would eventually free up tons 
of garbage.


So, by only allowing for a moderate amount of garbage, more frequent GCs 
will be less intensive and less likely to cause weird situations.


The other part of the theory is that it is usually better to leave tons of 
memory to the OS for efficiently caching files, rather than force Java to 
manage large amounts of memory, which it typically does not do so well.


-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Sunday, March 24, 2013 2:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr using a ridiculous amount of memory

Just to get started, do you hit OOM quickly with a few expensive queries, or
is it after a number of hours and lots of queries?

Does Java heap usage seem to be growing linearly as queries come in, or are
there big spikes?

How complex/rich are your queries (e.g., how many terms, wildcards, faceted
fields, sorting, etc.)?

As a baseline experiment, start a Solr server, see how much Java heap is
used/available. Then do a couple of typical queries, and check the heap size
again. Then do a couple more similar but different (to avoid query cache
matches), and check the heap again. Maybe do that a few times to get a
handle on the baseline memory required and whether there might be a leak of
some sort. Do enough queries to hits all of the fields, facets, sorting,
etc. that are likely to be encountered in one of your typical days that hits
OOM - just not the volume of queries. The goal is to determine if there is
something inherently memory intensive in your index/queries, or something
relating to a leak based on total query volume.

-- Jack Krupansky

-Original Message- 
From: John Nielsen

Sent: Sunday, March 24, 2013 4:19 AM
To: solr-user@lucene.apache.org
Subject: Solr using a ridiculous amount of memory

Hello all,

We are running a solr cluster which is now running solr-4.2.

The index is about 35GB on disk with each register between 15k and 30k.
(This is simply the size of a full xml reply of one register. I'm not sure
how to measure it otherwise.)

Our memory requirements are running amok. We have less than a quarter of
our customers running now and even though we have allocated 25GB to the JVM
already, we are still seeing daily OOM crashes. We used to just allocate
more memory to the JVM, but with the way solr is scaling, we would need
well over 100GB of memory on each node to finish the project, and thats
just not going to happen. I need to lower the memory requirements somehow.

I can see from the memory dumps we've done that the field cache is by far
the biggest sinner. Of special interest to me is the recent introduction of
DocValues which supposedly mitigates this issue by using memory outside the
JVM. I just can't, because of lack of documentation, seem to make it work.

We do a lot of facetting. One client facets on about 50.000 docs of approx
30k each on 5 fields. I understand that this is VERY memory intensive.

Schema with DocValues attempt at solving problem:
http://pastebin.com/Ne23NnW4
Config: http://pastebin.com/x1qykyXW

The cache is pretty well tuned. Any lower and i get evictions.

Come hell or high water, my JVM memory requirements must come down. Simply
moving some memory load outside of the JVM would be awesome! Making it not
use the field cache for anything would also (probably) work for me. I
thought about killing off my other caches, but from the dumps, they just
don't seem to use that much memory.

I am at my wits end. Any help would be sorely appreciated.

--
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk 



Re: Too many fields to Sort in Solr

2013-03-24 Thread adityab
thanks Eric. in this query q=*:* the Lucene score is always 1 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4050944.html
Sent from the Solr - User mailing list archive at Nabble.com.


[ANNOUNCE] Solr wiki editing change

2013-03-24 Thread Steve Rowe
The wiki at http://wiki.apache.org/solr/ has come under attack by spammers more 
frequently of late, so the PMC has decided to lock it down in an attempt to 
reduce the work involved in tracking and removing spam.

From now on, only people who appear on 
http://wiki.apache.org/solr/ContributorsGroup will be able to 
create/modify/delete wiki pages.

Please request either on the solr-user@lucene.apache.org or on 
d...@lucene.apache.org to have your wiki username added to the 
ContributorsGroup page - this is a one-time step.

Steve

RE: SOLR 4.2 SolrQuery exception

2013-03-24 Thread Sandeep Kumar Anumalla
Hi,

I managed to resolve this issue and I am getting the results also. But this 
time I am getting a different exception while loading Solr Container

Here is the Code.

String SOLR_HOME = /data/solr1/example/solr/collection1;
CoreContainer coreContainer = new CoreContainer(SOLR_HOME);
CoreDescriptor discriptor = new CoreDescriptor(coreContainer, 
collection1, new File(SOLR_HOME).getAbsolutePath());
SolrCore solrCore = coreContainer.create(discriptor);
coreContainer.register(solrCore, false);
File home = new File( SOLR_HOME );
File f = new File( home, solr.xml );
coreContainer.load( SOLR_HOME, f );
server = new EmbeddedSolrServer( coreContainer, collection1 );
SolrQuery q = new SolrQuery();


Parameters inside Solrconfig.xml
!-- writeLockTimeout1000/writeLockTimeout  --
lockTypesimple/lockType
unlockOnStartuptrue/unlockOnStartup


WARNING: Unable to get IndexCommit on startup
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
SimpleFSLock@/data/solr1/example/solr/collection1/./data/index/write.lock
   at org.apache.lucene.store.Lock.obtain(Lock.java:84)
   at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:636)
   at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:77)
   at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
   at 
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:192)
   at 
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:106)
   at 
org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:904)
   at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:592)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:801)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:619)
   at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:679)



From: Sandeep Kumar Anumalla
Sent: 24 March, 2013 03:44 PM
To: solr-user@lucene.apache.org
Subject: SOLR 4.2 SolrQuery exception

I am using the below code and getting the exception while using SolrQuery



Mar 24, 2013 3:08:07 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@795e0c2b 
main{StandardDirectoryReader(segments_49:524 _4v(4.2):C299313 
_4x(4.2):C2953/1396 _4y(4.2):C2866/1470 _4z(4.2):C4263/2793 _50(4.2):C3554/761 
_51(4.2):C1126/365 _52(4.2):C650/285 _53(4.2):C500/215 _54(4.2):C1808/1593 
_55(4.2):C1593)}
Mar 24, 2013 3:08:07 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at 
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64)
at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1586)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

Mar 24, 2013 3:08:07 PM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=null path=null 
params={event=firstSearcherq=static+firstSearcher+warming+in+solrconfig.xmldistrib=false}
 status=500 QTime=4
Mar 24, 2013 3:08:07 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Mar 24, 2013 3:08:07 PM 
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener 
newSearcher
INFO: Loading spell index for spellchecker: default
Mar 24, 2013 3:08:07 PM 
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener 
newSearcher
INFO: Loading spell index for spellchecker: wordbreak
Mar 24, 2013 3:08:07 PM org.apache.solr.core.SolrCore 

Using Solrj to Get termVectors

2013-03-24 Thread Rendy Bambang Junior
Hi all,

I've enabled term vector component to be stored. The result has been
shown using http request on browser. Since I'm planning to build web
service using java, I need to get those values using Solrj.

I've been googling find this solution
(http://stackoverflow.com/questions/8977852/how-to-parse-the-termvectorcomponent-response-to-which-java-object)
but it seems like some function has been deprecated.

Does anybody know how to get termVectors using Solrj?

-- 
Regards,
Rendy Bambang Junior
Informatics Engineering '09
Bandung Institute of Technology


Re: how to get term vector information of sepcific word/position in field

2013-03-24 Thread vrparekh
Thanks Chris,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-term-vector-information-of-sepcific-word-position-in-field-tp4047637p4050997.html
Sent from the Solr - User mailing list archive at Nabble.com.


Multi-core and replicated Solr cloud testing. Data-directory mis-configures

2013-03-24 Thread Trevor Campbell

I have three indexes which I have set up as three separate cores, using this 
solr.xml config.

  cores adminPath=/admin/cores host=${host:} hostPort=${jetty.port:}
core name=jira-issue instanceDir=jira-issue 
   property name=dataDir value=jira-issue/data/ /
/core
core name=jira-comment instanceDir=jira-comment 
   property name=dataDir value=jira-comment/data/ /
/core
core name=jira-change-history instanceDir=jira-change-history 
   property name=dataDir value=jira-change-history/data/ /
/core
  /cores

This works just fine a standalone solr.

I duplicated this setup on the same machine under a completely separate solr installation (solr-nodeb) and modified all 
the data directroies to point to the direstories in nodeb.  This all worked fine.


I then connected the 2 instances together with zoo-keeper using settings -Dbootstrap_conf=true 
-Dcollection.configName=jiraCluster -DzkRun -DnumShards=1 for the first intsance and -DzkHost=localhost:9080 for  the 
second. (I'm using tomcat and ports 8080 and 8081 for the 2 Solr instances)


Now the data directories of the second node point to the data directories in 
the first node.

I have tried many settings in the solrconfig.xml for each core but am now using 
absolute paths, e.g.
dataDir/home//solr-4.2.0-nodeb/example/multicore/jira-comment/data/dataDir

previously I used
${solr.jira-comment.data.dir:/home/tcampbell/solr-4.2.0-nodeb/example/multicore/jira-comment/data}
but that had the same result.

It seems zookeeper is forcing data directory config from the uploaded 
configuration on all the nodes in the cluster?

How can I do testing on a single machine? Do I really need identical directory 
layouts on all machines?