Re: how to best convert some term in q to a fq

2013-12-27 Thread Josh Lincoln
what if you add your country field to qf with a strong boost? the search
experience would be slightly different than if you filter on country, but
maybe still good enough for your users and certainly simpler to implement
and maintain. You'd likely only want exact matches. Assuming you are using
edismax and a stopword file for your main query fields, you'll run into an
issue if you just index your country field as a string and there's a
stopword anywhere in your query...see SOLR-3085. To avoid this, yet still
boost on country only when there's an exact match, you could index the
country field as text using KeywordTokenizerFactory and the same stopword
file as your other fields.

Regardless of the approach you take, unless there's only a small list of
countries you care about, multi-word countries might be too big an issue to
ignore, especially when the name contains common words (e.g. United States,
South Korea, New Zealand). This may be a good candidate for named entity
recognition on the query, possibly leveraging openNLP. I once saw a
presentation on how linkedin uses nlp on the query to detect the types of
entities the user is looking for. Seems similar to what you're trying to
accomplish. Of course, if countries are the only thing you're interested in
then you may be able to get away with client code for simple substring
matching using a static list of countries.

 On Dec 23, 2013 3:08 PM, "Joel Bernstein"  wrote:

> I  would suggest handling this in the client. You could write custom Solr
> code also but it would be more complicated because you'd be working with
> Solr's API's.
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Mon, Dec 23, 2013 at 2:36 PM, jmlucjav  wrote:
>
> > Hi,
> >
> > I have this scenario that I think is no unusual: solr will get a user
> > entered query string like 'apple pear france'.
> >
> > I need to do this: if any of the terms is a country, then change the
> query
> > params to move that term to a fq, i.e:
> > q=apple pear france
> > to
> > q=apple pear&fq=country:france
> >
> > What do you guys would be the best way to implement this?
> > - custom searchcomponent or queryparser
> > - servlet in same jetty as solr
> > - client code
> >
> > To simplify, consider countries are just a single term.
> >
> > Any pointer to an example to base this on would be great. thanks
> >
>


Re: lots of tlog.

2013-12-27 Thread yypvsxf19870706
Hi Mark

   Yes i have auto commit on just as other cores. All of my core are with 
the same configuration.the maxtime is 60s, max number is 10. Have not new 
searcher opening.
   On contrast with others , only this core is abnormal. Even it cannot 
relay the log,because of the exception. 
   
org.apache.solr.update.UpdateLog  �C REYPLAY_ERR: IOException reading log
>> org.apache.solr.common.SolrException: Invalid Number 
>> java.math.BigDecimal:238124404


1.Why does the log replay failed?
2.Even having the autocommit on,there are still lots of logs,I can only image 
that the something wrong happened during the data import process. Are there any 
other suppositions about the problem?



发自我的 iPhone

在 2013-12-28,4:09,Mark Miller  写道:

> Do you have auto commit (not softAutoCommit) on? At what value? Are you
> ever opening a new searcher?
> 
> - Mark
> 
> On 12/27/2013 05:17 AM, YouPeng Yang wrote:
>> Hi
>>  There is  a failed core in my solrcloud cluster(solr 4.6 with hdfs 2.2)
>> when I start my solrcloud . I noticed that there are lots of tlog files [1]
>> .
>>  The start proccess was stuck ,it need to do log replay.However it
>> encountered error[2].
>>  I do think that it is abnormal that there are still a lot of tlogs.(I
>> compare it with other normal cores in my solrCloud, even there are only 1
>> tlog file in normal cores).  Which conditions wil lead to this abnormal
>> lots of  tlog files? I try many times hoping to replay the this,but i
>> failed.
>>  Please give some suggestions.
>> 
>> 
>> Regards
>> [1]---
>> ..
>> -rw-r--r--   1 solr solr   23462198 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.676
>> -rw-r--r--   1 solr solr   26083634 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.677
>> -rw-r--r--   1 solr solr   25428275 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.678
>> -rw-r--r--   1 solr solr   15794489 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.679
>> -rw-r--r--   1 solr solr   23593272 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.680
>> -rw-r--r--   1 solr solr   23068981 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.681
>> -rw-r--r--   1 solr solr   21889334 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.682
>> -rw-r--r--   1 solr solr   23331127 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.683
>> -rw-r--r--   1 solr solr   22675763 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.684
>> -rw-r--r--   1 solr solr   21954870 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.685
>> -rw-r--r--   1 solr solr   21496118 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.686
>> -rw-r--r--   1 solr solr   20775222 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.687
>> -rw-r--r--   1 solr solr   24183093 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.688
>> -rw-r--r--   1 solr solr   24183090 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.689
>> -rw-r--r--   1 solr solr   24379701 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.690
>> -rw-r--r--   1 solr solr   25887033 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.691
>> -rw-r--r--   1 solr solr   25231672 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.692
>> -rw-r--r--   1 solr solr   15335737 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.693
>> -rw-r--r--   1 solr solr8529530 2013-12-27 14:33
>> /solr/repCore/repCore/core_node20/data/tlog/tlog.742
>> 
>> [2]-
>> 133462 [recoveryExecutor-48-thread-1] WARN
>> org.apache.solr.update.UpdateLog  �C Starting log replay hdfs
>> tlog{file=hdfs://lklcluster/solr/repCore/repCore/core_node2
>> 0/data/tlog/tlog.693 refcount=2} active=false starting pos=0
>> 133576 [recoveryExecutor-48-thread-1] WARN
>> org.apache.solr.update.UpdateLog  �C REYPLAY_ERR: IOException reading log
>> org.apache.solr.common.SolrException: Invalid Number:
>> java.math.BigDecimal:238124404
>>at
>> org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:396)
>>at
>> org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:98)
>>at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:582)
>>at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
>>   

Re: Trigger event on change of a field in a document

2013-12-27 Thread Utkarsh Sengar
Thanks! I think, I will explore how to implement it outside solr.

-Utkarsh


On Fri, Dec 27, 2013 at 3:20 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> And if you really really really wanted that in Solr then have a look at
> UpdateRequestProcessors.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Dec 27, 2013 6:19 PM, "Otis Gospodnetic" 
> wrote:
>
> > Hi,
> >
> > This sounds like it would be best implemented outside the search engine.
> >
> > Otis
> > Solr & ElasticSearch Support
> > http://sematext.com/
> > On Dec 27, 2013 4:29 PM, "Utkarsh Sengar"  wrote:
> >
> >> I am experimenting with implementing a price drop feature.
> >> Can I register some document's fields and trigger some sort of events if
> >> the values change in those fields?
> >>
> >> For example:
> >> 1. Price of itemX is $10
> >> 2. Say the price changes to $17 or $5 (increases or decreases) when the
> >> new
> >> data loads.
> >> 3. Trigger an event to take an action on that change, like send out an
> >> email.
> >>
> >> I believe this is somewhat similar but not the same as the percolator
> >> feature in elasticsearch.
> >>
> >> --
> >> Thanks,
> >> -Utkarsh
> >>
> >
>



-- 
Thanks,
-Utkarsh


Re: Trigger event on change of a field in a document

2013-12-27 Thread Otis Gospodnetic
And if you really really really wanted that in Solr then have a look at
UpdateRequestProcessors.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 27, 2013 6:19 PM, "Otis Gospodnetic" 
wrote:

> Hi,
>
> This sounds like it would be best implemented outside the search engine.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Dec 27, 2013 4:29 PM, "Utkarsh Sengar"  wrote:
>
>> I am experimenting with implementing a price drop feature.
>> Can I register some document's fields and trigger some sort of events if
>> the values change in those fields?
>>
>> For example:
>> 1. Price of itemX is $10
>> 2. Say the price changes to $17 or $5 (increases or decreases) when the
>> new
>> data loads.
>> 3. Trigger an event to take an action on that change, like send out an
>> email.
>>
>> I believe this is somewhat similar but not the same as the percolator
>> feature in elasticsearch.
>>
>> --
>> Thanks,
>> -Utkarsh
>>
>


Re: Trigger event on change of a field in a document

2013-12-27 Thread Otis Gospodnetic
Hi,

This sounds like it would be best implemented outside the search engine.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 27, 2013 4:29 PM, "Utkarsh Sengar"  wrote:

> I am experimenting with implementing a price drop feature.
> Can I register some document's fields and trigger some sort of events if
> the values change in those fields?
>
> For example:
> 1. Price of itemX is $10
> 2. Say the price changes to $17 or $5 (increases or decreases) when the new
> data loads.
> 3. Trigger an event to take an action on that change, like send out an
> email.
>
> I believe this is somewhat similar but not the same as the percolator
> feature in elasticsearch.
>
> --
> Thanks,
> -Utkarsh
>


Trigger event on change of a field in a document

2013-12-27 Thread Utkarsh Sengar
I am experimenting with implementing a price drop feature.
Can I register some document's fields and trigger some sort of events if
the values change in those fields?

For example:
1. Price of itemX is $10
2. Say the price changes to $17 or $5 (increases or decreases) when the new
data loads.
3. Trigger an event to take an action on that change, like send out an
email.

I believe this is somewhat similar but not the same as the percolator
feature in elasticsearch.

-- 
Thanks,
-Utkarsh


Re: lots of tlog.

2013-12-27 Thread Mark Miller
Do you have auto commit (not softAutoCommit) on? At what value? Are you
ever opening a new searcher?

- Mark

On 12/27/2013 05:17 AM, YouPeng Yang wrote:
> Hi
>   There is  a failed core in my solrcloud cluster(solr 4.6 with hdfs 2.2)
> when I start my solrcloud . I noticed that there are lots of tlog files [1]
> .
>   The start proccess was stuck ,it need to do log replay.However it
> encountered error[2].
>   I do think that it is abnormal that there are still a lot of tlogs.(I
> compare it with other normal cores in my solrCloud, even there are only 1
> tlog file in normal cores).  Which conditions wil lead to this abnormal
> lots of  tlog files? I try many times hoping to replay the this,but i
> failed.
>   Please give some suggestions.
> 
> 
> Regards
> [1]---
> ..
> -rw-r--r--   1 solr solr   23462198 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.676
> -rw-r--r--   1 solr solr   26083634 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.677
> -rw-r--r--   1 solr solr   25428275 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.678
> -rw-r--r--   1 solr solr   15794489 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.679
> -rw-r--r--   1 solr solr   23593272 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.680
> -rw-r--r--   1 solr solr   23068981 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.681
> -rw-r--r--   1 solr solr   21889334 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.682
> -rw-r--r--   1 solr solr   23331127 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.683
> -rw-r--r--   1 solr solr   22675763 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.684
> -rw-r--r--   1 solr solr   21954870 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.685
> -rw-r--r--   1 solr solr   21496118 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.686
> -rw-r--r--   1 solr solr   20775222 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.687
> -rw-r--r--   1 solr solr   24183093 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.688
> -rw-r--r--   1 solr solr   24183090 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.689
> -rw-r--r--   1 solr solr   24379701 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.690
> -rw-r--r--   1 solr solr   25887033 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.691
> -rw-r--r--   1 solr solr   25231672 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.692
> -rw-r--r--   1 solr solr   15335737 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.693
> -rw-r--r--   1 solr solr8529530 2013-12-27 14:33
> /solr/repCore/repCore/core_node20/data/tlog/tlog.742
> 
> [2]-
> 133462 [recoveryExecutor-48-thread-1] WARN
> org.apache.solr.update.UpdateLog  – Starting log replay hdfs
> tlog{file=hdfs://lklcluster/solr/repCore/repCore/core_node2
> 0/data/tlog/tlog.693 refcount=2} active=false starting pos=0
> 133576 [recoveryExecutor-48-thread-1] WARN
> org.apache.solr.update.UpdateLog  – REYPLAY_ERR: IOException reading log
> org.apache.solr.common.SolrException: Invalid Number:
> java.math.BigDecimal:238124404
> at
> org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:396)
> at
> org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:98)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:582)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
> at
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
> at
> org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1313)
> at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1202)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at
> java.util.c

Re: adding wild card at the end of the text and search(like sql like search)

2013-12-27 Thread Andrea Gazzarini
Hi Suren,
You could try a textfield with a WordDelimiter filter + EdgeNGram filter
(this latter only in the index analyzer). In this way your heading will be
indexed as

Jo
Joh
John
Johns

Johnsonso
Johnsonson
Johnsonsons

and a query for

Johnson & so

Will be translated into

Johnsonso

Therfore founding a match with one of the indexed tokens.

Best,
Andrea
On 27 Dec 2013 20:03, "suren"  wrote:

> My field type is text and i am using WhitespaceTokenizer. I want to search
> like SQL like search
> ie I want to search for ORGANIZATION_NAME field
> ORGANIZATION_NAM:"JOHNSON & SO"*
>
> should return
> "JOHNSON & SON", "JOHNSON & SONS", "JOHNSON & SONS COMPANY"...
>
> I tried
> ORGANIZATION_NAM:"JOHNSON &" AND ORGANIZATION_NAM:SON*
> Problem with this is It will bring "SWANK & SON, INC. C/O EVELYN JOHNSON".
> I want the result to be like SQL like search.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/adding-wild-card-at-the-end-of-the-text-and-search-like-sql-like-search-tp4108399.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: adding wild card at the end of the text and search(like sql like search)

2013-12-27 Thread Ahmet Arslan
Hi Suren,

With https://issues.apache.org/jira/browse/SOLR-1604 it is possible but it 
requires re-packaging solr.war. Your query will be : ORGANIZATION_NAM:"JOHNSON 
& SO*"


Ahmet


On Friday, December 27, 2013 9:03 PM, suren  wrote:
My field type is text and i am using WhitespaceTokenizer. I want to search
like SQL like search
ie I want to search for ORGANIZATION_NAME field
ORGANIZATION_NAM:"JOHNSON & SO"*

should return 
"JOHNSON & SON", "JOHNSON & SONS", "JOHNSON & SONS COMPANY"...

I tried 
ORGANIZATION_NAM:"JOHNSON &" AND ORGANIZATION_NAM:SON*
Problem with this is It will bring "SWANK & SON, INC. C/O EVELYN JOHNSON".
I want the result to be like SQL like search.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/adding-wild-card-at-the-end-of-the-text-and-search-like-sql-like-search-tp4108399.html
Sent from the Solr - User mailing list archive at Nabble.com.


adding wild card at the end of the text and search(like sql like search)

2013-12-27 Thread suren
My field type is text and i am using WhitespaceTokenizer. I want to search
like SQL like search
ie I want to search for ORGANIZATION_NAME field
ORGANIZATION_NAM:"JOHNSON & SO"*

should return 
"JOHNSON & SON", "JOHNSON & SONS", "JOHNSON & SONS COMPANY"...

I tried 
ORGANIZATION_NAM:"JOHNSON &" AND ORGANIZATION_NAM:SON*
Problem with this is It will bring "SWANK & SON, INC. C/O EVELYN JOHNSON".
I want the result to be like SQL like search.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/adding-wild-card-at-the-end-of-the-text-and-search-like-sql-like-search-tp4108399.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - Match whole word only in text fields

2013-12-27 Thread Ahmet Arslan
Hi Haya,

Yes you are correct, "myName=aaa bbb" will produce index terms: "myName", 
"aaa", "bbb". You can verify this at admin analysis page. You can test your 
analyzer by entering sample text in  an user interface. 
Your query "myName aaa" will be a Phrase Query and will match with above 
settings.
Your query "myName bbb" won't match.

http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Proximity_Searches

It is better to give it a try. 

Ahmet


On Friday, December 27, 2013 6:18 AM, Kydryavtsev Andrey  
wrote:
Hi everybody!

Ahmet, do I get it correct - if I use this text_char_norm field type, for input 
"myName=aaa bbb" I'll index terms "myName", "aaa", "bbb"? So I'll match with 
query like "myName" or query like  "bbb", but not match with "myName aaa". I 
can use this type for query value, so split "myName aaa" into ( "myName" && 
"aaa") - and it will work. But this approach will give false positive match 
with "myName bbb". What do you think, how I can handle this? One of the  
approaches is to use in this field type KeywordTokenizer+ShingleFilter instead 
of WhitespaceTokenizerFactory, so tokens like "myName", "myName aaa", "myName 
aaa bbb", "aaa", "aaa bbb", "bbb" will be indexed, but it significantly 
increased index size in case of long values. 


26.12.2013, 03:20, "Ahmet Arslan" :
> Hi Haya,
>
> With MappingCharFilter you can have full control over character set that you 
> want to split.
>
> in mappings.txt you will have
>
> ":" => " "
> "=" => " "
>
> Use the following type and see if it suits for your needs. Update 
> mappings.txt according to your needs.
>
>      positionIncrementGap="100" >
>   
>      mapping="mappings.txt"/>
>     
>     
>   
>     
>
> On Sunday, December 22, 2013 9:19 PM, haya.axelrod  
> wrote:
> I have a text field that can contain very long values (like text files). I
> want to create field type for it (text, not string), in order to have
> something like "Match whole word only" in notepad++, but the delimiter
> should not be only white spaces. If i have:
>
> myName=aaa bbb
>
> I would like to get it for the following search strings "aaa", "bbb", "aaa
> bbb", "myName=aaa bbb", "myName", but not for "aa" or "ame=a" or "a bb".
> Another example is:
>
> aaa bbb
> Can i do this somehow?
>
> What should be my field type definition?
>
> The text can contain any character. Before search i'm escaping the search
> string using
> http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html
>
> Thanks
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Match-whole-word-only-in-text-fields-tp4107795.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-27 Thread Greg Preston
Interesting.  I'm not using score at all (all searches have an
explicit sort defined).  I'll try setting omit norms on all my fields
and see if I can reproduce.

Thanks.

-Greg


On Fri, Dec 27, 2013 at 4:25 AM, Michael McCandless
 wrote:
> Likely this is for field norms, which use doc values under the hood.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Dec 26, 2013 at 5:03 PM, Greg Preston
>  wrote:
>> Does anybody with knowledge of solr internals know why I'm seeing
>> instances of Lucene42DocValuesProducer when I don't have any fields
>> that are using DocValues?  Or am I misunderstanding what this class is
>> for?
>>
>> -Greg
>>
>>
>> On Mon, Dec 23, 2013 at 12:07 PM, Greg Preston
>>  wrote:
>>> Hello,
>>>
>>> I'm loading up our solr cloud with data (from a solrj client) and
>>> running into a weird memory issue.  I can reliably reproduce the
>>> problem.
>>>
>>> - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
>>> - 24 solr nodes (one shard each), spread across 3 physical hosts, each
>>> host has 256G of memory
>>> - index and tlogs on ssd
>>> - Xmx=7G, G1GC
>>> - Java 1.7.0_25
>>> - schema and solrconfig.xml attached
>>>
>>> I'm using composite routing to route documents with the same clientId
>>> to the same shard.  After several hours of indexing, I occasionally
>>> see an IndexWriter go OOM.  I think that's a symptom.  When that
>>> happens, indexing continues, and that node's tlog starts to grow.
>>> When I notice this, I stop indexing, and bounce the problem node.
>>> That's where it gets interesting.
>>>
>>> Upon bouncing, the tlog replays, and then segments merge.  Once the
>>> merging is complete, the heap is fairly full, and forced full GC only
>>> helps a little.  But if I then bounce the node again, the heap usage
>>> goes way down, and stays low until the next segment merge.  I believe
>>> segment merges are also what causes the original OOM.
>>>
>>> More details:
>>>
>>> Index on disk for this node is ~13G, tlog is ~2.5G.
>>> See attached mem1.png.  This is a jconsole view of the heap during the
>>> following:
>>>
>>> (Solr cloud node started at the left edge of this graph)
>>>
>>> A) One CPU core pegged at 100%.  Thread dump shows:
>>> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
>>> nid=0x7a74 runnable [0x7f5a41c5f000]
>>>java.lang.Thread.State: RUNNABLE
>>> at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
>>> at 
>>> org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
>>> at 
>>> org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
>>> at 
>>> org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
>>> at 
>>> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
>>> at 
>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
>>> at 
>>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
>>> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
>>> at 
>>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>>> at 
>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>>>
>>> B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
>>> memory freed.  Thread dump shows:
>>> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
>>> nid=0x7a74 runnable [0x7f5a41c5f000]
>>>java.lang.Thread.State: RUNNABLE
>>> at 
>>> org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
>>> at 
>>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
>>> at 
>>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
>>> at 
>>> org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
>>> at 
>>> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
>>> at 
>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
>>> at 
>>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
>>> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
>>> at 
>>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>>> at 
>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>>>
>>> C) One CPU core pegged at 100%.  Manually triggered GC.  No memory
>>> freed.  Thread dump shows:
>>> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
>>> nid=0x7a74 runnable [0x7f5a41c5f000]
>>>java.lang.Thread.State: RUNNABLE
>>> at 
>>> org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.j

Error -please help

2013-12-27 Thread anand chandak

Hi,

I am running into below errors when running checkindex utility :


/java -cp 
/apps/search/tomcat/webapps/solr/WEB-INF/lib/lucene-core-4.4.0.jar 
org.apache.lucene.index.CheckIndex .//

//
//NOTE: testing will be more thorough if you run java with 
'-ea:org.apache.lucene...', so assertions are enabled//

//
//Opening index @ .//
//
//ERROR: could not read any segments file in directory//
//java.io.EOFException: read past EOF: 
MMapIndexInput(path="/apps/search/data/customers/solr/solr/adidas-archive/data/index.20131227051833263/segments_a")//
//at 
org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:78)//
//at 
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)//

//at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)//
//at 
org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:320)//
//at 
org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:380)//
//at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:812)//
//at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:663)//
//at 
org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:376)//
//at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:382)//

//at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1854)//
/

/
/

/Does that mean my index is corrupt ? And how should I fix it ? This 
error is seen on the slave which is running on 4.x and master on 3.x and 
the index is large size 100g+. Additionally, I am seeing following error 
message in the logs,

/

/

2013-12-27 05:03:32,391 [explicit-fetchindex-cmd] ERROR 
org.apache.solr.handler.ReplicationHandler- SnapPull failed 
:org.apache.solr.common.SolrException: Index fetch failed :
  at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:485)
  at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:319)
  at 
org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:220)
   Caused by: java.io.EOFException: read past EOF: 
MMapIndexInput(path="/apps/search/data/customers/solr/solr/adidas-archive/data/index.20131227050332242/segments_a")
  at 
org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:78)
  at 
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)

  at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
  at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:320)
  at 
org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:380)
  at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:812)
  at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:663)

  at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:376)
  at org.apache.lucene.index.IndexWriter.(IndexWriter.java:711)
  at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:77)
  at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
  at 
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:267)
  at 
org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:179)
  at 
org.apache.solr.update.DirectUpdateHandler2.newIndexWriter(DirectUpdateHandler2.java:632)
  at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:469)

  ... 2 more/


Any help or guidance is much appreciated.


Thanks,
Anand.


Query time join with conditions

2013-12-27 Thread heaven
Hello, I have one physical Solr collection and multiple logical collections
in it. The separation is done by using the "type" field (Ruby on Rails
application). So I have 2 logical collections: Profile and RssEntry and
would not want to add RssEntries content to Profiles index.

When I want to search some profiles my usual query looks like this:
[ path=select parameters={fq: ["type:Profile"], sort: "score desc", start:
0, rows: 20, defType: "edismax", boost: [], q: "*:*"} ]

But this time I need to also search in associated RssEntries content. In SQL
my query would look like this:
SELECT * FROM profiles INNER JOIN (SELECT * FROM rss_entries WHERE
rss_entries.keywords LIKE '#{query}') ON profiles.id =
rss_entries.profile_id

So basically in Solr query I need something like {!join from=profile_id
to=id WHERE type = 'RssEntry'}. I heard there are additional parameters like
fromQuery and fromSearcher, but cant find anything about them here
http://wiki.apache.org/solr/Join

Any help appreciated,
Alex



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-time-join-with-conditions-tp4108365.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Disable caching on sorting to improve performance

2013-12-27 Thread PeterKerk
Thanks and good call, that has been there for quite some time! 
I've changed it to: -Xms200m -Xmx1500m 
I'll look into the effect of this first.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-caching-on-sorting-to-improve-performance-tp4108356p4108362.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Disable caching on sorting to improve performance

2013-12-27 Thread Markus Jelsma
Hi,

Sorting uses Lucene FieldCache which you cannot disable. If you have lots of 
documents and sort on three fields, you must increase the heap space. Also, 
note that you have defined Xmx twice here, i don't know what effect that will 
have.

Cheers
 
 
-Original message-
> From:PeterKerk 
> Sent: Friday 27th December 2013 12:39
> To: solr-user@lucene.apache.org
> Subject: Disable caching on sorting to improve performance
> 
> I'm getting a lot of java heap memory full errors. I've now been reading into
> solr performance (in the meantime also configuring the sematext tools to try
> to drill down to the cause)
> 
> I already increased the memory available to Solr:
> bash -c "cd /cygdrive/c/Databases/solr-4.3.1/example/;java
> -Dsolr.solr.home="./example-DIH/solr/" -jar -Xmx200m -Xmx1200m start.jar" 
> 
> And now I read:
> Factors that affect memory usage:
> http://stackoverflow.com/questions/1546898/how-to-reduce-solr-memory-usage
> I see that sorting affects memory usage. I have the feeling that is the case
> for me, because since I implemented sorting the memory errors are going
> through the roof. 
> 
> I read here
> http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters how to
> do it on filter queries, but I was wondering how I can disable the caching
> on the sort parameter in below statement where it now has
> `&sort=clickcount%20desc,prijs%20desc,updatedate%20desc`
> 
> 
> searchquery.Append("&fl=id,artikelnummer,titel,friendlyurl,pricerange,lang,currency,createdate")
> searchquery.Append("&facet.field=pricerange")
> searchquery.Append("&facet.mincount=1") 
> searchquery.Append("&facet.sort=index") 
> searchquery.Append("&omitHeader=true")
> searchquery.Append("&sort=clickcount%20desc,prijs%20desc,updatedate%20desc") 
>   
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Disable-caching-on-sorting-to-improve-performance-tp4108356.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-27 Thread Michael McCandless
Likely this is for field norms, which use doc values under the hood.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Dec 26, 2013 at 5:03 PM, Greg Preston
 wrote:
> Does anybody with knowledge of solr internals know why I'm seeing
> instances of Lucene42DocValuesProducer when I don't have any fields
> that are using DocValues?  Or am I misunderstanding what this class is
> for?
>
> -Greg
>
>
> On Mon, Dec 23, 2013 at 12:07 PM, Greg Preston
>  wrote:
>> Hello,
>>
>> I'm loading up our solr cloud with data (from a solrj client) and
>> running into a weird memory issue.  I can reliably reproduce the
>> problem.
>>
>> - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
>> - 24 solr nodes (one shard each), spread across 3 physical hosts, each
>> host has 256G of memory
>> - index and tlogs on ssd
>> - Xmx=7G, G1GC
>> - Java 1.7.0_25
>> - schema and solrconfig.xml attached
>>
>> I'm using composite routing to route documents with the same clientId
>> to the same shard.  After several hours of indexing, I occasionally
>> see an IndexWriter go OOM.  I think that's a symptom.  When that
>> happens, indexing continues, and that node's tlog starts to grow.
>> When I notice this, I stop indexing, and bounce the problem node.
>> That's where it gets interesting.
>>
>> Upon bouncing, the tlog replays, and then segments merge.  Once the
>> merging is complete, the heap is fairly full, and forced full GC only
>> helps a little.  But if I then bounce the node again, the heap usage
>> goes way down, and stays low until the next segment merge.  I believe
>> segment merges are also what causes the original OOM.
>>
>> More details:
>>
>> Index on disk for this node is ~13G, tlog is ~2.5G.
>> See attached mem1.png.  This is a jconsole view of the heap during the
>> following:
>>
>> (Solr cloud node started at the left edge of this graph)
>>
>> A) One CPU core pegged at 100%.  Thread dump shows:
>> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
>> nid=0x7a74 runnable [0x7f5a41c5f000]
>>java.lang.Thread.State: RUNNABLE
>> at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
>> at 
>> org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
>> at 
>> org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
>> at 
>> org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
>> at 
>> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
>> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
>> at 
>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
>> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
>> at 
>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>> at 
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>>
>> B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
>> memory freed.  Thread dump shows:
>> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
>> nid=0x7a74 runnable [0x7f5a41c5f000]
>>java.lang.Thread.State: RUNNABLE
>> at 
>> org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
>> at 
>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
>> at 
>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
>> at 
>> org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
>> at 
>> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
>> at 
>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
>> at 
>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
>> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
>> at 
>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>> at 
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>>
>> C) One CPU core pegged at 100%.  Manually triggered GC.  No memory
>> freed.  Thread dump shows:
>> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
>> nid=0x7a74 runnable [0x7f5a41c5f000]
>>java.lang.Thread.State: RUNNABLE
>> at 
>> org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
>> at 
>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:108)
>> at 
>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
>> at 
>> org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesC

Disable caching on sorting to improve performance

2013-12-27 Thread PeterKerk
I'm getting a lot of java heap memory full errors. I've now been reading into
solr performance (in the meantime also configuring the sematext tools to try
to drill down to the cause)

I already increased the memory available to Solr:
bash -c "cd /cygdrive/c/Databases/solr-4.3.1/example/;java
-Dsolr.solr.home="./example-DIH/solr/" -jar -Xmx200m -Xmx1200m start.jar" 

And now I read:
Factors that affect memory usage:
http://stackoverflow.com/questions/1546898/how-to-reduce-solr-memory-usage
I see that sorting affects memory usage. I have the feeling that is the case
for me, because since I implemented sorting the memory errors are going
through the roof. 

I read here
http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters how to
do it on filter queries, but I was wondering how I can disable the caching
on the sort parameter in below statement where it now has
`&sort=clickcount%20desc,prijs%20desc,updatedate%20desc`


searchquery.Append("&fl=id,artikelnummer,titel,friendlyurl,pricerange,lang,currency,createdate")
searchquery.Append("&facet.field=pricerange")
searchquery.Append("&facet.mincount=1") 
searchquery.Append("&facet.sort=index") 
searchquery.Append("&omitHeader=true")
searchquery.Append("&sort=clickcount%20desc,prijs%20desc,updatedate%20desc") 
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-caching-on-sorting-to-improve-performance-tp4108356.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to use Solr in my project

2013-12-27 Thread Gopal Agarwal
Highlighting can be done as three step process:

Pre-requisite: Get the pdf with text after the OCR of the image pdf.

Step 1:
For sending the extracted text content from text pdf to solr, use a low
level pdf converter such as poppler-utils (pdftotext or pdftohtml) to
correctly get the coordinates and page no. of each word. Store it in a
seperate file as word map. This word map will contain page+coordinates
mapping to occurence number for word.

Step 2:
Solr highlighter needs to be changed to get the word and their occurence
number in the text document, rather than the character offsets for each hit.

Step 3:
Combine the solr output to the word map created in step 1 and the pdf page
and coordinates can be generated for original pdf docuemnt which can be
highlighted by any viewer.

We are succesufully able to implement this for our own application.

Thanks,
Gopal


On Thu, Dec 26, 2013 at 3:56 PM, Gora Mohanty  wrote:

> On 26 December 2013 15:44, Fatima Issawi  wrote:
> > Hi,
> >
> > I should clarify. We have another application extracting the text from
> the document. The full text from each document will be stored in a database
> either at the document level or page level (this hasn't been decided yet).
> We will also be storing word location of each word on the page in the
> database.
>
> What do you mean by "word location"? The number on the page? What purpose
> would this serve?
>
> > What I'm having problems with is deciding on the schema. We want a user
> to be able to search for a word in the database, have a list of documents
> that word is located in, and location in the document that word is located
> it. When he selects the search results, we want the scanned picture to have
> that word highlighted on the page.
> [...]
>
> I think that you might be confusing things:
> * If you have the full-text, you can highlight where the word was found.
> Solr
>   highlighting handles this for you, and there is no need to store word
> location
> * You can have different images (presumably, individual scanned pages)
> linked
>to different sections of text, and show the entire image.
> Highlighting in the image
>is not possible, unless by "word location" you mean the (x, y)
> coordinates of
>the word on the page. Even then:
>- It will be prohibitively expensive to store the location of every
> word in every
>  image for a large number of documents
>- Some image processing will be required to handle the highlighting
> after the
>  scanned image is retrieved
>
> Regards,
> Gora
>


lots of tlog.

2013-12-27 Thread YouPeng Yang
Hi
  There is  a failed core in my solrcloud cluster(solr 4.6 with hdfs 2.2)
when I start my solrcloud . I noticed that there are lots of tlog files [1]
.
  The start proccess was stuck ,it need to do log replay.However it
encountered error[2].
  I do think that it is abnormal that there are still a lot of tlogs.(I
compare it with other normal cores in my solrCloud, even there are only 1
tlog file in normal cores).  Which conditions wil lead to this abnormal
lots of  tlog files? I try many times hoping to replay the this,but i
failed.
  Please give some suggestions.


Regards
[1]---
..
-rw-r--r--   1 solr solr   23462198 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.676
-rw-r--r--   1 solr solr   26083634 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.677
-rw-r--r--   1 solr solr   25428275 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.678
-rw-r--r--   1 solr solr   15794489 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.679
-rw-r--r--   1 solr solr   23593272 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.680
-rw-r--r--   1 solr solr   23068981 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.681
-rw-r--r--   1 solr solr   21889334 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.682
-rw-r--r--   1 solr solr   23331127 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.683
-rw-r--r--   1 solr solr   22675763 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.684
-rw-r--r--   1 solr solr   21954870 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.685
-rw-r--r--   1 solr solr   21496118 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.686
-rw-r--r--   1 solr solr   20775222 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.687
-rw-r--r--   1 solr solr   24183093 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.688
-rw-r--r--   1 solr solr   24183090 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.689
-rw-r--r--   1 solr solr   24379701 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.690
-rw-r--r--   1 solr solr   25887033 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.691
-rw-r--r--   1 solr solr   25231672 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.692
-rw-r--r--   1 solr solr   15335737 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.693
-rw-r--r--   1 solr solr8529530 2013-12-27 14:33
/solr/repCore/repCore/core_node20/data/tlog/tlog.742

[2]-
133462 [recoveryExecutor-48-thread-1] WARN
org.apache.solr.update.UpdateLog  – Starting log replay hdfs
tlog{file=hdfs://lklcluster/solr/repCore/repCore/core_node2
0/data/tlog/tlog.693 refcount=2} active=false starting pos=0
133576 [recoveryExecutor-48-thread-1] WARN
org.apache.solr.update.UpdateLog  – REYPLAY_ERR: IOException reading log
org.apache.solr.common.SolrException: Invalid Number:
java.math.BigDecimal:238124404
at
org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:396)
at
org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:98)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:582)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1313)
at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1202)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
133623 [recoveryExecutor-48-thread-1] WARN
org.apache.solr.update.UpdateLog  – REYPLAY_ERR: IOException reading log
org.apache.solr.common.SolrException: Invalid Number:
java.math.BigDecimal:238124405
at
org.apache.s

Re: Solr Query Slowliness

2013-12-27 Thread Jilal Oussama
Thank you guys for your replies,

Sorry that I forgot to mention that I have allocated 10 GB of memory to the
Java Heap.


2013/12/26 Shawn Heisey 

> On 12/26/2013 3:38 AM, Jilal Oussama wrote:
> > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
> memory
> > & 840 GB storage) and contained several cores for different usage.
> >
> > When I manually executed a query through Solr Admin (a query containing
> > 10~15 terms, with some of them having boosts over one field and limited
> to
> > one result without any sorting or faceting etc ) it takes around 700
> > ms, and the Core contained 7 million documents.
> >
> > When the scripts are executed things get slower, my query takes 7~10s.
> >
> > Then what I did is to turn to SolrCloud expecting huge performance
> increase.
> >
> > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
> > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one
> collection
> > to contain the core I was querying, I sharded it to 25 shards (each node
> > containing 5 shards without replication), each shards took 54 MB of
> storage.
> >
> > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
> > is very good !
> >
> > Tested my scripts again (I have 30 scripts running at the same time), and
> > as a surprise, things run fast for 5 seconds then it turns realy slow
> again
> > (query time ).
> >
> > I updated the solrconfig.xml to remove the query caches (I don't need
> them
> > since queries are very different and only 1 time queries) and changes the
> > index memory to 1 GB, but only got a small increase (3~4s for each query
> ?!)
>
> Your SolrCloud setup has 35 times as much CPU power (just basing this on
> the ECU numbers) as your single-server setup, ten times as much memory,
> and a lot more IOPS because you moved to SSD.  A 10X increase in single
> query performance is not surprising.
>
> You have not indicated how much memory is assigned to the java heap on
> each server.  I think that there are three possible problems happening
> here, with a strong possibility that the third one is happening at the
> same time as one of the other two:
>
> 1) Full garbage collections are too frequent because the heap is too small.
> 2) Garbage collections take too long because the heap is very large and
> GC is not tuned.
> 3) Extremely high disk I/O because the OS disk cache is too small for
> the index size.
>
> Some information on these that might be helpful:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> The general solution for good Solr performance is to throw hardware,
> especially memory, at the problem.  It's worth pointing out that any
> level of hardware investment has an upper limit on the total query
> volume it can support.  Running 30 test scripts at the same time will be
> difficult for all but the most powerful and expensive hardware to deal
> with, especially if every query is different.  A five-server cloud where
> each server has 8 CPU cores and 15GB of memory is pretty small, all
> things considered.
>
> Thanks,
> Shawn
>
>