date:20131021

Re: Query modification

2013-10-21 Thread Sidharth

Hi,

Even I am using the QueryComponent to perform similar modification to the
query. I am modifying the query in the process() method of the Component. 
The problem I am facing is that after modifying the query and setting it in
the response builder, I make a call to super.process(rb).

This call is taking around 100ms and is degrading component's performance.
Wanted to know that is process the right place to do it and do we need to
make a call to super.process() method?

Regards,
Sidharth.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-modification-tp939584p4096753.html
Sent from the Solr - User mailing list archive at Nabble.com.

caching HTML pages in SOLR

2013-10-21 Thread Shailendra Mudgal

Hi,

As google stores HTML pages as *cached* documents, is there a similar
provision in SOLR. I am using SOLR-4.4.0.


Thanks,
Shailendra

Re: caching HTML pages in SOLR

2013-10-21 Thread Alexandre Rafalovitch

Not in Solr itself, no. Solr is all about Search. Caching (and rewriting
resource links, etc) should probably be part of whatever does the document
fetching.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Mon, Oct 21, 2013 at 1:19 PM, Shailendra Mudgal 
mudgal.shailen...@gmail.com wrote:

 Hi,

 As google stores HTML pages as *cached* documents, is there a similar
 provision in SOLR. I am using SOLR-4.4.0.


 Thanks,
 Shailendra

Re: caching HTML pages in SOLR

2013-10-21 Thread Shailendra Mudgal

Thanks Alex.

I was thinking if something already exists of this sort.




On Mon, Oct 21, 2013 at 12:05 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Not in Solr itself, no. Solr is all about Search. Caching (and rewriting
 resource links, etc) should probably be part of whatever does the document
 fetching.

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Mon, Oct 21, 2013 at 1:19 PM, Shailendra Mudgal 
 mudgal.shailen...@gmail.com wrote:

  Hi,
 
  As google stores HTML pages as *cached* documents, is there a similar
  provision in SOLR. I am using SOLR-4.4.0.
 
 
  Thanks,
  Shailendra

Re: caching HTML pages in SOLR

2013-10-21 Thread Alexandre Rafalovitch

I have not used it myself, but perhaps something like
http://www.crawl-anywhere.com/ is along what you were looking for.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Mon, Oct 21, 2013 at 1:44 PM, Shailendra Mudgal 
mudgal.shailen...@gmail.com wrote:

 Thanks Alex.

 I was thinking if something already exists of this sort.




 On Mon, Oct 21, 2013 at 12:05 PM, Alexandre Rafalovitch
 arafa...@gmail.comwrote:

  Not in Solr itself, no. Solr is all about Search. Caching (and rewriting
  resource links, etc) should probably be part of whatever does the
 document
  fetching.
 
  Regards,
 Alex.
 
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
  On Mon, Oct 21, 2013 at 1:19 PM, Shailendra Mudgal 
  mudgal.shailen...@gmail.com wrote:
 
   Hi,
  
   As google stores HTML pages as *cached* documents, is there a similar
   provision in SOLR. I am using SOLR-4.4.0.
  
  
   Thanks,
   Shailendra

Class name of parsing the fq clause

2013-10-21 Thread YouPeng Yang

Hi
   I search the solr with fq clause,which is like:
   fq=BEGINTIME:[2013-08-25T16:00:00Z TO *] AND BUSID:(M3 OR M9)


   I am curious about the parsing process . I want to study it.
   What is the Java file name describes  the parsing  process of the fq
clause.


  Thanks

Regards.

Re: XLSB files not indexed

2013-10-21 Thread Roland Everaert

Hi Otis,

In our case, there is no exception raised by tika or solr, a lucene
document is created, but the content field contains only a few white spaces
like for ODF files.


Roland.


On Sat, Oct 19, 2013 at 3:54 AM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi Roland,

 It looks like:
 Tika - yes
 Solr - no?

 Based on http://search-lucene.com/?q=xlsb

 ODF != XLSB though, I think...

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



 On Fri, Oct 18, 2013 at 7:36 AM, Roland Everaert reveatw...@gmail.com
 wrote:
  Hi,
 
  Can someone tells me if tika is supposed to extract data from xlsb files
  (the new MS Office format in binary form)?
 
  If so then it seems that solr is not able to index them like it is not
 able
  to index ODF files (a JIRA is already opened for ODF
  https://issues.apache.org/jira/browse/SOLR-4809)
 
  Can someone confirm the problem, or tell me what to do to make solr works
  with XLSB files.
 
 
  Regards,
 
 
  Roland.

RE: Facet performance

2013-10-21 Thread Toke Eskildsen

On Fri, 2013-10-18 at 18:30 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
 Toke Eskildsen [mailto:t...@statsbiblioteket.dk] wrote:
  Unfortunately the enum-solution is normally quite slow when there
  are enough unique values to trigger the too many  values-exception.
  [...]
 
 [...] And yes, the fc method was terribly slow in a case where it did
 work.  Something like 20 minutes whereas enum returned within a few
 seconds.

Err.. What? That sounds _very_ strange. You have millions of unique
values so fc should be a lot faster than enum, not the other way around.

I assume the 20 minutes was for the first call. How fast does subsequent
calls return for fc?


Maybe you could provide some approximate numbers?

- Documents in your index
- Unique values in the CONTENT field
- Hits are returned from a typical query
- Xmx

Regards,
Toke Eskildsen, State and University Library, Denmark

how to debug my own analyzer in solr

2013-10-21 Thread Mingzhu Gao

Dear solr expert ,

I would like to write my own analyser ( Chinese analyser ) and integrate them 
into solr as solr plugin .

From the log information , the custom analyzer can be loaded into solr 
successfully .  I define my fieldType with this custom analyzer.

Now the problem is that ,  when I try this analyzer from 
http://localhost:8983/solr/#/collection1/analysis , click the analysis , then 
choose my FieldType , then input some text .
After I click Analyse Value button , the solr hang there , I cannot get any 
result or response in a few minutes.

I also try to add  some data by curl 
http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml , or 
by post.sh in exampledocs folder ,
The same issue , the solr hang there , no result and not response .

Can anybody give me some suggestions on how to debug solr to work with my own 
custom analyzer ?

By the way , I write a java program to call my custom analyzer , the result is 
okay , for example , the following code can work well .
==
Analyzer analyzer = new MyAnalyzer() ;

TokenStream ts = analyzer.tokenStream() ;

CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);

ts.reset();

while (ts.incrementToken()){

System.out.println(ta.toString());

}

=


Thanks,

-Mingz

Ordering Results

2013-10-21 Thread kumar

Hi,


I have a situation that if user looking for anything first it has to give
the suggestions from the exact match and as well as the fuzzy matches.

Suppose we are showing 15 suggestions.

First 10 results are exact match results.
And remaining 5 results from fuzzy matches.

Can anybody give me suggestions how to achieve this task.



Regards,
kumar



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Ordering-Results-tp4096774.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to avoid recover? how to ensure a recover success?

2013-10-21 Thread sling

Hi, guys:

I have an online application with solrcloud 4.1, but I get errors of
syncpeer every 2 or 3 weeks...
In my opinion, a recover occers when a replica can not sync data to its
leader successfully.

I see the topic 
http://lucene.472066.n3.nabble.com/SolrCloud-5x-Errors-while-recovering-td4022542.html
and https://issues.apache.org/jira/i#browse/SOLR-4032, but why did I still
get similar errors in solrcloud4.1?

so is there any settings for syncpeer? 
how to reduce the probability of this error?
when recover happens, how to ensure its success?



The errors I got is like these:
[2013.10.21 10:39:13.482]2013-10-21 10:39:13,482 WARN
[org.apache.solr.handler.SnapPuller] - Error in fetching packets 
[2013.10.21 10:39:13.482]java.io.EOFException
[2013.10.21 10:39:13.482]   at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154)
[2013.10.21 10:39:13.482]   at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146)
[2013.10.21 10:39:13.482]   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1136)
[2013.10.21 10:39:13.482]   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1099)
[2013.10.21 10:39:13.482]   at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:738)
[2013.10.21 10:39:13.482]   at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:395)
[2013.10.21 10:39:13.482]   at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:274)
[2013.10.21 10:39:13.482]   at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:153)
[2013.10.21 10:39:13.482]   at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
[2013.10.21 10:39:13.482]   at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
[2013.10.21 10:39:13.485]2013-10-21 10:39:13,485 WARN
[org.apache.solr.handler.SnapPuller] - Error in fetching packets 
[2013.10.21 10:39:13.485]java.io.EOFException
[2013.10.21 10:39:13.485]   at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154)
[2013.10.21 10:39:13.485]   at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146)
[2013.10.21 10:39:13.485]   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1136)
[2013.10.21 10:39:13.485]   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1099)
[2013.10.21 10:39:13.485]   at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:738)
[2013.10.21 10:39:13.485]   at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:395)
[2013.10.21 10:39:13.485]   at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:274)
[2013.10.21 10:39:13.485]   at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:153)
[2013.10.21 10:39:13.485]   at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
[2013.10.21 10:39:13.485]   at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
[2013.10.21 10:41:08.461]2013-10-21 10:41:08,461 ERROR
[org.apache.solr.handler.ReplicationHandler] - SnapPull failed
:org.apache.solr.common.SolrException: Unable to download
_fi05_Lucene41_0.pos completely. Downloaded 0!=1485
[2013.10.21 10:41:08.461]   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1230)
[2013.10.21 10:41:08.461]   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1110)
[2013.10.21 10:41:08.461]   at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:738)
[2013.10.21 10:41:08.461]   at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:395)
[2013.10.21 10:41:08.461]   at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:274)
[2013.10.21 10:41:08.461]   at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:153)
[2013.10.21 10:41:08.461]   at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
[2013.10.21 10:41:08.461]   at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
[2013.10.21 10:41:08.461]
[2013.10.21 10:41:08.461]2013-10-21 10:41:08,461 ERROR
[org.apache.solr.cloud.RecoveryStrategy] - Error while trying to
recover:org.apache.solr.common.SolrException: Replication for recovery
failed.
[2013.10.21 10:41:08.461]   at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156)
[2013.10.21 10:41:08.461]   at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
[2013.10.21 10:41:08.461]   at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
[2013.10.21 10:41:08.461]
[2013.10.21 10:41:08.555]2013-10-21 10:41:08,462 ERROR

Re: Solr timeout after reboot

2013-10-21 Thread michael.boom

Thank you, Otis!

I've integrated the SPM on my Solr instances and now I have access to
monitoring data.
Could you give me some hints on which metrics should I watch?

Below I've added my query configs. Is there anything I could tweak here?

query
maxBooleanClauses1024/maxBooleanClauses

filterCache class=solr.FastLRUCache
 size=1000
 initialSize=1000
 autowarmCount=0/

queryResultCache class=solr.LRUCache
 size=1000
 initialSize=1000
 autowarmCount=0/
   
documentCache class=solr.LRUCache
   size=1000
   initialSize=1000
   autowarmCount=0/


fieldValueCache class=solr.FastLRUCache
size=1000 
initialSize=1000 
autowarmCount=0 /
  

enableLazyFieldLoadingtrue/enableLazyFieldLoading

   queryResultWindowSize20/queryResultWindowSize

   queryResultMaxDocsCached100/queryResultMaxDocsCached

listener event=firstSearcher class=solr.QuerySenderListener
  arr name=queries
lst
  str name=qactive:true/str
/lst
  /arr
/listener

useColdSearcherfalse/useColdSearcher

maxWarmingSearchers10/maxWarmingSearchers

  /query



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096780.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solrconfig.xml carrot2 params

2013-10-21 Thread Stanislaw Osinski

 Thanks, I'm new to the clustering libraries.  I finally made this
 connection when I started browsing through the carrot2 source.  I had
 pulled down a smaller MM document collection from our test environment.  It
 was not ideal as it was mostly structured, but small.  I foolishly thought
 I could cluster on the text copy field before realizing that it was index
 only.  Doh!


That is correct -- for the time being the clustering can only be applied to
stored Solr fields.



 Our documents are indexed in SolrCloud, but stored in HBase.  I want to
 allow users to page through Solr hits, but would like to cluster on all (or
 at least several thousand) of the top search hits.  Now I'm puzzling over
 how to efficiently cluster over possibly several thousand Solr hits when
 the documents are in HBase.  I thought an HBase coprocessor, but carrot2
 isn't designed for distributed computation.  Mahout, in the Hadoop M/R
 context, seems slow and heavy handed for this scale; maybe, I just need to
 dig deeper into their library.  Or I could just be missing something
 fundamental?  :)


Carrot2 algorithms were not designed to be distributed, but you can still
use them in a single-threaded scenario. To do this, you'd probably need to
write a bit of code that gets the text of your documents from your HBase
and runs Carrot2 clustering on it. If you use the STC clustering algorithm,
you should be able to process several thousands of documents in a
reasonable time (order of seconds). The clustering side of the code should
be a matter of a few lines of code (
http://download.carrot2.org/stable/javadoc/overview-summary.html#clustering-documents).
The tricky bit of the setup may be efficiently getting the text for
clustering -- it can happen that fetching can take longer than the actual
clustering.

S.

Re: how to debug my own analyzer in solr

2013-10-21 Thread Mingzhu Gao

More information about this , the custom analyzer just implement
createComponents of Analyzer.

And my configure in schema.xml is just something like :

fieldType name=text_cn class=solr.TextField 
 analyzer class=my.package.CustomAnalyzer /
/fieldType


From the log I cannot see any error information , however , when I want to
analysis or add document data , it always hang there .

Any way to debug or narrow down the problem ?

Thanks in advance .

-Mingz

On 10/21/13 4:35 PM, Mingzhu Gao m...@adobe.com wrote:

Dear solr expert ,

I would like to write my own analyser ( Chinese analyser ) and integrate
them into solr as solr plugin .

From the log information , the custom analyzer can be loaded into solr
successfully .  I define my fieldType with this custom analyzer.

Now the problem is that ,  when I try this analyzer from
http://localhost:8983/solr/#/collection1/analysis , click the analysis ,
then choose my FieldType , then input some text .
After I click Analyse Value button , the solr hang there , I cannot get
any result or response in a few minutes.

I also try to add  some data by curl
http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml
, or by post.sh in exampledocs folder ,
The same issue , the solr hang there , no result and not response .

Can anybody give me some suggestions on how to debug solr to work with my
own custom analyzer ?

By the way , I write a java program to call my custom analyzer , the
result is okay , for example , the following code can work well .
==
Analyzer analyzer = new MyAnalyzer() ;

TokenStream ts = analyzer.tokenStream() ;

CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);

ts.reset();

while (ts.incrementToken()){

System.out.println(ta.toString());

}

=


Thanks,

-Mingz

Error: Repeated service interruptions - failure processing document: Read timed out

2013-10-21 Thread Ronny Heylen

Hi,

Just installed SOLR and when running a job I have the following problem :


Error: Repeated service interruptions - failure processing document: Read
timed out


Like I said, just installed SOLR and so very new to the topic. ( On Windows
2008R2 )

SOLR 4.4

Tomcat 7.0.42

ManifoldCF 1.3

Postgresql 9.1.1

In the log Tomcat I find the following error :

ERROR - 2013-10-21 09:35:16.551; org.apache.solr.common.SolrException;
null:org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:
Processing of multipart/form-data request failed. null

at
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367)

at
org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)

at
org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:492)

at
org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:626)

at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:143)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:342)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)

at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)

at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)

at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)

at
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1852)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.lang.Thread.run(Unknown Source)

Caused by: java.net.SocketTimeoutException

at
org.apache.coyote.http11.InternalAprInputBuffer.fill(InternalAprInputBuffer.java:607)

at
org.apache.coyote.http11.InternalAprInputBuffer$SocketInputBuffer.doRead(InternalAprInputBuffer.java:642)

at
org.apache.coyote.http11.filters.ChunkedInputFilter.readBytes(ChunkedInputFilter.java:275)

at
org.apache.coyote.http11.filters.ChunkedInputFilter.parseCRLF(ChunkedInputFilter.java:377)

at
org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:147)

at
org.apache.coyote.http11.InternalAprInputBuffer.doRead(InternalAprInputBuffer.java:534)

at org.apache.coyote.Request.doRead(Request.java:422)

at
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:290)

at
org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:449)

at
org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:315)

at
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:200)

at java.io.FilterInputStream.read(Unknown Source)

at
org.apache.commons.fileupload.util.LimitedInputStream.read(LimitedInputStream.java:125)

at
org.apache.commons.fileupload.MultipartStream$ItemInputStream.makeAvailable(MultipartStream.java:977)

at
org.apache.commons.fileupload.MultipartStream$ItemInputStream.read(MultipartStream.java:887)

at java.io.InputStream.read(Unknown Source)

at
org.apache.commons.fileupload.util.Streams.copy(Streams.java:94)

at
org.apache.commons.fileupload.util.Streams.copy(Streams.java:64)

at
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)

... 21 more

Re: how to debug my own analyzer in solr

2013-10-21 Thread Siegfried Goeschl


Thread Dump and/or Remote Debugging?!

Cheers,

Siegfried Goeschl

On 21.10.13 11:58, Mingzhu Gao wrote:

More information about this , the custom analyzer just implement
createComponents of Analyzer.

And my configure in schema.xml is just something like :

fieldType name=text_cn class=solr.TextField 
  analyzer class=my.package.CustomAnalyzer /
/fieldType


 From the log I cannot see any error information , however , when I want to
analysis or add document data , it always hang there .

Any way to debug or narrow down the problem ?

Thanks in advance .

-Mingz

On 10/21/13 4:35 PM, Mingzhu Gao m...@adobe.com wrote:


Dear solr expert ,

I would like to write my own analyser ( Chinese analyser ) and integrate
them into solr as solr plugin .

From the log information , the custom analyzer can be loaded into solr
successfully .  I define my fieldType with this custom analyzer.

Now the problem is that ,  when I try this analyzer from
http://localhost:8983/solr/#/collection1/analysis , click the analysis ,
then choose my FieldType , then input some text .
After I click Analyse Value button , the solr hang there , I cannot get
any result or response in a few minutes.

I also try to add  some data by curl
http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml
, or by post.sh in exampledocs folder ,
The same issue , the solr hang there , no result and not response .

Can anybody give me some suggestions on how to debug solr to work with my
own custom analyzer ?

By the way , I write a java program to call my custom analyzer , the
result is okay , for example , the following code can work well .
==
Analyzer analyzer = new MyAnalyzer() ;

TokenStream ts = analyzer.tokenStream() ;

CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);

ts.reset();

while (ts.incrementToken()){

System.out.println(ta.toString());

}

=


Thanks,

-Mingz

Re: how to debug my own analyzer in solr

2013-10-21 Thread Koji Sekiguchi


Hi Mingz,

If you use Eclipse, you can debug Solr with your plugin like this:

# go to Solr install directory
$ cd $SOLR
$ ant run-example -Dexample.debug=true

Then connect the JVM from Eclipse via remote debug port 5005.

Good luck!

koji


(13/10/21 18:58), Mingzhu Gao wrote:

More information about this , the custom analyzer just implement
createComponents of Analyzer.

And my configure in schema.xml is just something like :

fieldType name=text_cn class=solr.TextField 
  analyzer class=my.package.CustomAnalyzer /
/fieldType



From the log I cannot see any error information , however , when I want to

analysis or add document data , it always hang there .

Any way to debug or narrow down the problem ?

Thanks in advance .

-Mingz

On 10/21/13 4:35 PM, Mingzhu Gao m...@adobe.com wrote:


Dear solr expert ,

I would like to write my own analyser ( Chinese analyser ) and integrate
them into solr as solr plugin .

From the log information , the custom analyzer can be loaded into solr
successfully .  I define my fieldType with this custom analyzer.

Now the problem is that ,  when I try this analyzer from
http://localhost:8983/solr/#/collection1/analysis , click the analysis ,
then choose my FieldType , then input some text .
After I click Analyse Value button , the solr hang there , I cannot get
any result or response in a few minutes.

I also try to add  some data by curl
http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml
, or by post.sh in exampledocs folder ,
The same issue , the solr hang there , no result and not response .

Can anybody give me some suggestions on how to debug solr to work with my
own custom analyzer ?

By the way , I write a java program to call my custom analyzer , the
result is okay , for example , the following code can work well .
==
Analyzer analyzer = new MyAnalyzer() ;

TokenStream ts = analyzer.tokenStream() ;

CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);

ts.reset();

while (ts.incrementToken()){

System.out.println(ta.toString());

}

=


Thanks,

-Mingz







--
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html

Re: Ordering Results

2013-10-21 Thread Upayavira

Do two searches.

Why do you want to do this though? It seems a bit strange. Presumably
your users want the best matches possible whether exact or fuzzy?
Wouldn't it be best to return both exact and fuzzy matches, but score
the exact ones above the fuzzy ones?

Upayavira

On Mon, Oct 21, 2013, at 09:56 AM, kumar wrote:
 Hi,
 
 
 I have a situation that if user looking for anything first it has to give
 the suggestions from the exact match and as well as the fuzzy matches.
 
 Suppose we are showing 15 suggestions.
 
 First 10 results are exact match results.
 And remaining 5 results from fuzzy matches.
 
 Can anybody give me suggestions how to achieve this task.
 
 
 
 Regards,
 kumar
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Ordering-Results-tp4096774.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud Performance Issue

2013-10-21 Thread Erick Erickson

Shamik:

You're right, the use of NOW shouldn't be making that much of a difference
between versions. FYI, though, here's a way to use NOW and re-use fq
clauses:

http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/

It may well be this setting:

autoSoftCommit
maxTime1000/maxTime
/autoSoftCommit

Every second (assuming you're indexing), you're throwing away all your
top-level caches and executing any autowarm queries etc. And if you _don't_
have any autowarming queries, you may not be filling caches, an expensive
process. Try lengthening that out to, say, a minute (6) or even longer
and see if that makes a difference. If that's the culprit, you at least
have a place to start.

If that's not it, it's also possible you're seeing decompression.

How many documents are you returning and how big are they? There's some
anecdotal comments that the default stored field decompression for either a
large number of doc or very large docs may be playing a role here. Try
setting fl=id (don't return any stored fields). If that is faster, this
might be your problem.

queryResultCache is often not very high re: hit ratio. It's usually used
for paging, so if your users aren't hitting the next page you may not hit
many.

Best,
Erick


On Sat, Oct 19, 2013 at 4:12 AM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 What happens if you have just 1 shard - no distributed search, like
 before? SPM for Solr or any other monitoring tool that captures OS and
 Solr metrics should help you find the source of the problem faster.
 Is disk IO the same? utilization of caches? JVM version, heap, etc.?
 CPU usage? network?  I'd look at each of these things side by side and
 look for big differences.

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 SOLR Performance Monitoring -- http://sematext.com/spm



 On Fri, Oct 18, 2013 at 1:38 AM, shamik sham...@gmail.com wrote:
  I tried commenting out NOW in bq, but didn't make any difference in the
  performance. I do see minor entry in the queryfiltercache rate which is a
  meager 0.02.
 
  I'm really struggling to figure out the bottleneck, any known pain
 points I
  should be checking ?
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Issue-tp4095971p4096277.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: caching HTML pages in SOLR

2013-10-21 Thread Furkan KAMACI

You can also try: https://www.varnish-cache.org/


2013/10/21 Alexandre Rafalovitch arafa...@gmail.com

 I have not used it myself, but perhaps something like
 http://www.crawl-anywhere.com/ is along what you were looking for.

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Mon, Oct 21, 2013 at 1:44 PM, Shailendra Mudgal 
 mudgal.shailen...@gmail.com wrote:

  Thanks Alex.
 
  I was thinking if something already exists of this sort.
 
 
 
 
  On Mon, Oct 21, 2013 at 12:05 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
   Not in Solr itself, no. Solr is all about Search. Caching (and
 rewriting
   resource links, etc) should probably be part of whatever does the
  document
   fetching.
  
   Regards,
  Alex.
  
   Personal website: http://www.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all at
   once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
  
  
   On Mon, Oct 21, 2013 at 1:19 PM, Shailendra Mudgal 
   mudgal.shailen...@gmail.com wrote:
  
Hi,
   
As google stores HTML pages as *cached* documents, is there a
 similar
provision in SOLR. I am using SOLR-4.4.0.
   
   
Thanks,
Shailendra

Re: ExtractRequestHandler, skipping errors

2013-10-21 Thread Jan Høydahl

Guido, can you point us to the Commons-Compress JIRA issue which reports your 
particular problem? Perhaps uncompress works just fine?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

18. okt. 2013 kl. 14:48 skrev Guido Medina guido.med...@temetra.com:

 Dont, commons compress 1.5 is broken, either use 1.4.1 or later. Our app 
 stopped compressing properly for a maven update.
 
 Guido.
 
 On 18/10/13 12:40, Roland Everaert wrote:
 I will open a JIRA issue, I suppose that I just have to create an account
 first?
 
 
 Regards,
 
 
 Roland.
 
 
 On Fri, Oct 18, 2013 at 12:05 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:
 
 Hi,
 
 I think the flag cannot ignore NoSuchMethodError. There may be something
 wrong here?
 
 ... I've just checked my Solr 4.5 directories and I found Tika version is
 1.4.
 
 Tika 1.4 seems to use commons compress 1.5:
 
 http://svn.apache.org/viewvc/**tika/tags/1.4/tika-parsers/**
 pom.xml?view=markuphttp://svn.apache.org/viewvc/tika/tags/1.4/tika-parsers/pom.xml?view=markup
 
 But I see commons-compress-1.4.1.jar in solr/contrib/extraction/lib/
 directory.
 
 Can you open a JIRA issue?
 
 For now, you can get commons compress 1.5 and put it to the directory
 (don't forget to remove 1.4.1 jar file).
 
 koji
 
 
 (13/10/18 16:37), Roland Everaert wrote:
 
 Hi,
 
 We already configure the extractrequesthandler to ignore tika exceptions,
 but it is solr that complains. The customer manage to reproduce the
 problem. Following is the error from the solr.log. The file type cause
 this
 exception was WMZ. It seems that something is missing in a solr class. We
 use SOLR 4.4.
 
 ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common.**SolrException;
 null:java.lang.**RuntimeException: java.lang.NoSuchMethodError:
 org.apache.commons.compress.**compressors.**CompressorStreamFactory.**
 setDecompressConcatenated(Z)V
  at
 org.apache.solr.servlet.**SolrDispatchFilter.sendError(**
 SolrDispatchFilter.java:673)
  at
 org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
 SolrDispatchFilter.java:383)
  at
 org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
 SolrDispatchFilter.java:158)
  at
 org.apache.catalina.core.**ApplicationFilterChain.**internalDoFilter(**
 ApplicationFilterChain.java:**243)
  at
 org.apache.catalina.core.**ApplicationFilterChain.**doFilter(**
 ApplicationFilterChain.java:**210)
  at
 org.apache.catalina.core.**StandardWrapperValve.invoke(**
 StandardWrapperValve.java:222)
  at
 org.apache.catalina.core.**StandardContextValve.invoke(**
 StandardContextValve.java:123)
  at
 org.apache.catalina.core.**StandardHostValve.invoke(**
 StandardHostValve.java:171)
  at
 org.apache.catalina.valves.**ErrorReportValve.invoke(**
 ErrorReportValve.java:99)
  at
 org.apache.catalina.valves.**AccessLogValve.invoke(**
 AccessLogValve.java:953)
  at
 org.apache.catalina.core.**StandardEngineValve.invoke(**
 StandardEngineValve.java:118)
  at
 org.apache.catalina.connector.**CoyoteAdapter.service(**
 CoyoteAdapter.java:408)
  at
 org.apache.coyote.http11.**AbstractHttp11Processor.**process(**
 AbstractHttp11Processor.java:**1023)
  at
 org.apache.coyote.**AbstractProtocol$**AbstractConnectionHandler.**
 process(AbstractProtocol.java:**589)
  at
 org.apache.tomcat.util.net.**AprEndpoint$SocketProcessor.**
 run(AprEndpoint.java:1852)
  at java.util.concurrent.**ThreadPoolExecutor.runWorker(**Unknown
 Source)
  at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**Unknown
 Source)
  at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.NoSuchMethodError:
 org.apache.commons.compress.**compressors.**CompressorStreamFactory.**
 setDecompressConcatenated(Z)V
  at
 org.apache.tika.parser.pkg.**CompressorParser.parse(**
 CompressorParser.java:102)
  at
 org.apache.tika.parser.**CompositeParser.parse(**
 CompositeParser.java:242)
  at
 org.apache.tika.parser.**CompositeParser.parse(**
 CompositeParser.java:242)
  at
 org.apache.tika.parser.**AutoDetectParser.parse(**
 AutoDetectParser.java:120)
  at
 org.apache.solr.handler.**extraction.**ExtractingDocumentLoader.load(**
 ExtractingDocumentLoader.java:**219)
  at
 org.apache.solr.handler.**ContentStreamHandlerBase.**handleRequestBody(**
 ContentStreamHandlerBase.java:**74)
  at
 org.apache.solr.handler.**RequestHandlerBase.**handleRequest(**
 RequestHandlerBase.java:135)
  at
 org.apache.solr.core.**RequestHandlers$**LazyRequestHandlerWrapper.**
 handleRequest(RequestHandlers.**java:241)
  at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1904)
  at
 org.apache.solr.servlet.**SolrDispatchFilter.execute(**
 SolrDispatchFilter.java:659)
  at
 org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
 SolrDispatchFilter.java:362)
  ... 16 more
 
 
 
 
 
 On Thu, Oct 17, 2013 at 5:19 PM, Koji Sekiguchi k...@r.email.ne.jp
 wrote:
 
  Hi Roland,
 
 (13/10/17 20:44), Roland Everaert wrote:

Question about docvalues

2013-10-21 Thread yriveiro

Hi,

If I have a field (named dv_field) configured to be indexed, stored and with
docvalues=true.

How I know that when I do a query like:

q=*:*facet=truefacet.field=dv_field, I'm really using the docvalues and
not the normal way?

Is it necessary duplicate the field and set index and stored to false and
let the docvalues property set to true?



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-docvalues-tp4096802.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr timeout after reboot

2013-10-21 Thread Peter Keegan

Have you tried this old trick to warm the FS cache?
cat .../core/data/index/* /dev/null

Peter


On Mon, Oct 21, 2013 at 5:31 AM, michael.boom my_sky...@yahoo.com wrote:

 Thank you, Otis!

 I've integrated the SPM on my Solr instances and now I have access to
 monitoring data.
 Could you give me some hints on which metrics should I watch?

 Below I've added my query configs. Is there anything I could tweak here?

 query
 maxBooleanClauses1024/maxBooleanClauses

 filterCache class=solr.FastLRUCache
  size=1000
  initialSize=1000
  autowarmCount=0/

 queryResultCache class=solr.LRUCache
  size=1000
  initialSize=1000
  autowarmCount=0/

 documentCache class=solr.LRUCache
size=1000
initialSize=1000
autowarmCount=0/


 fieldValueCache class=solr.FastLRUCache
 size=1000
 initialSize=1000
 autowarmCount=0 /


 enableLazyFieldLoadingtrue/enableLazyFieldLoading

queryResultWindowSize20/queryResultWindowSize

queryResultMaxDocsCached100/queryResultMaxDocsCached

 listener event=firstSearcher class=solr.QuerySenderListener
   arr name=queries
 lst
   str name=qactive:true/str
 /lst
   /arr
 /listener

 useColdSearcherfalse/useColdSearcher

 maxWarmingSearchers10/maxWarmingSearchers

   /query



 -
 Thanks,
 Michael
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096780.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr timeout after reboot

2013-10-21 Thread michael.boom

Hmm, no, I haven't...

What would be the effect of this ?



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096809.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr timeout after reboot

2013-10-21 Thread François Schiettecatte

To put the file data into file system cache which would make for faster access.

François


On Oct 21, 2013, at 8:33 AM, michael.boom my_sky...@yahoo.com wrote:

 Hmm, no, I haven't...
 
 What would be the effect of this ?
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096809.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Exact Match Results

2013-10-21 Thread kumar

I am querying solr for exact match results. But it is showing some other
results also.

Examle :

User Query String : 

Okkadu telugu movie

Results :

1.Okkadu telugu movie
2.Okkadunnadu telugu movie
3.YuganikiOkkadu telugu movie
4.Okkadu telugu movie stills


how can we order these results that 4th result has to come second.


Please anyone can you give me any idea?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-Match-Results-tp4096816.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exact Match Results

2013-10-21 Thread François Schiettecatte

Kumar

You might want to look into the 'pf' parameter:


https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

François

On Oct 21, 2013, at 9:24 AM, kumar pavan2...@gmail.com wrote:

 I am querying solr for exact match results. But it is showing some other
 results also.
 
 Examle :
 
 User Query String : 
 
 Okkadu telugu movie
 
 Results :
 
 1.Okkadu telugu movie
 2.Okkadunnadu telugu movie
 3.YuganikiOkkadu telugu movie
 4.Okkadu telugu movie stills
 
 
 how can we order these results that 4th result has to come second.
 
 
 Please anyone can you give me any idea?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Exact-Match-Results-tp4096816.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Class name of parsing the fq clause

2013-10-21 Thread Jack Krupansky

Start with org.apache.solr.handler.component.QueryComponent#prepare which 
fetches the fq parameters and indirectly invokes the query parser(s):


String[] fqs = req.getParams().getParams(CommonParams.FQ);
if (fqs!=null  fqs.length!=0) {
  ListQuery filters = rb.getFilters();
  // if filters already exists, make a copy instead of modifying the 
original
  filters = filters == null ? new ArrayListQuery(fqs.length) : new 
ArrayListQuery(filters);

  for (String fq : fqs) {
if (fq != null  fq.trim().length()!=0) {
  QParser fqp = QParser.getParser(fq, null, req);
  filters.add(fqp.getQuery());
}
  }
  // only set the filters if they are not empty otherwise
  // fq=someotherParam= will trigger all docs filter for every request
  // if filter cache is disabled
  if (!filters.isEmpty()) {
rb.setFilters( filters );

Note that this line actually invokes the parser:

  filters.add(fqp.getQuery());

Then in org.apache.lucene.search.Query.QParser#getParser:

QParserPlugin qplug = req.getCore().getQueryPlugin(parserName);
QParser parser =  qplug.createParser(qstr, localParams, req.getParams(), 
req);


And for the common case of the Lucene query parser, 
org.apache.solr.search.LuceneQParserPlugin#createParser:


public QParser createParser(String qstr, SolrParams localParams, SolrParams 
params, SolrQueryRequest req) {

 return new LuceneQParser(qstr, localParams, params, req);
}

And then in org.apache.lucene.search.Query.QParser#getQuery:

public Query getQuery() throws SyntaxError {
 if (query==null) {
   query=parse();

And then in org.apache.lucene.search.Query.LuceneQParser#parse:

lparser = new SolrQueryParser(this, defaultField);

lparser.setDefaultOperator
 (QueryParsing.getQueryParserDefaultOperator(getReq().getSchema(),
 getParam(QueryParsing.OP)));

return lparser.parse(qstr);

And then in org.apache.solr.parser.SolrQueryParserBase#parse:

Query res = TopLevelQuery(null);  // pass null so we can tell later if an 
explicit field was provided or not


And then in org.apache.solr.parser.QueryParser#TopLevelQuery, the parsing 
begins.


And org.apache.solr.parser.QueryParser.jj is the grammar for a basic 
Solr/Lucene query, and org.apache.solr.parser.QueryParser.java is generated 
by JFlex, and a lot of the logic is in the base class of the generated 
class, org.apache.solr.parser.SolrQueryParserBase.java.


Good luck! Happy hunting!

-- Jack Krupansky

-Original Message- 
From: YouPeng Yang

Sent: Monday, October 21, 2013 2:57 AM
To: solr-user@lucene.apache.org
Subject: Class name of parsing the fq clause

Hi
  I search the solr with fq clause,which is like:
  fq=BEGINTIME:[2013-08-25T16:00:00Z TO *] AND BUSID:(M3 OR M9)


  I am curious about the parsing process . I want to study it.
  What is the Java file name describes  the parsing  process of the fq
clause.


 Thanks

Regards.

Re: Solr timeout after reboot

2013-10-21 Thread Peter Keegan

I found this warming to be especially necessary after starting an instance
of those m3.xlarge servers, else the response times for the first minutes
was terrible.

Peter


On Mon, Oct 21, 2013 at 8:39 AM, François Schiettecatte 
fschietteca...@gmail.com wrote:

 To put the file data into file system cache which would make for faster
 access.

 François


 On Oct 21, 2013, at 8:33 AM, michael.boom my_sky...@yahoo.com wrote:

  Hmm, no, I haven't...
 
  What would be the effect of this ?
 
 
 
  -
  Thanks,
  Michael
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096809.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr timeout after reboot

2013-10-21 Thread michael.boom

I'm using the m3.xlarge server with 15G RAM, but my index size is over 100G,
so I guess putting running the above command would bite all available
memory.



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096827.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr timeout after reboot

2013-10-21 Thread François Schiettecatte

Well no, the OS is smarter than that, it manages file system cache along with 
other memory requirements. If applications need more memory then file system 
cache will likely be reduced. 

The command is a cheap trick to get the OS to fill the file system cache as 
quickly as possible, not sure how much it will help though with a 100GB index 
on a 15GB machine. This might work if you 'cat' the index files other than the 
'.fdx' and '.fdt' files.

François

On Oct 21, 2013, at 10:03 AM, michael.boom my_sky...@yahoo.com wrote:

 I'm using the m3.xlarge server with 15G RAM, but my index size is over 100G,
 so I guess putting running the above command would bite all available
 memory.
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096827.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Class name of parsing the fq clause

2013-10-21 Thread YouPeng Yang

HI Jack

  Thanks a lot for your explanation.


2013/10/21 Jack Krupansky j...@basetechnology.com

 Start with org.apache.solr.handler.**component.QueryComponent#**prepare
 which fetches the fq parameters and indirectly invokes the query parser(s):

 String[] fqs = req.getParams().getParams(**CommonParams.FQ);
 if (fqs!=null  fqs.length!=0) {
   ListQuery filters = rb.getFilters();
   // if filters already exists, make a copy instead of modifying the
 original
   filters = filters == null ? new ArrayListQuery(fqs.length) : new
 ArrayListQuery(filters);
   for (String fq : fqs) {
 if (fq != null  fq.trim().length()!=0) {
   QParser fqp = QParser.getParser(fq, null, req);
   filters.add(fqp.getQuery());
 }
   }
   // only set the filters if they are not empty otherwise
   // fq=someotherParam= will trigger all docs filter for every request
   // if filter cache is disabled
   if (!filters.isEmpty()) {
 rb.setFilters( filters );

 Note that this line actually invokes the parser:

   filters.add(fqp.getQuery());

 Then in org.apache.lucene.search.**Query.QParser#getParser:

 QParserPlugin qplug = req.getCore().getQueryPlugin(**parserName);
 QParser parser =  qplug.createParser(qstr, localParams, req.getParams(),
 req);

 And for the common case of the Lucene query parser, org.apache.solr.search.
 **LuceneQParserPlugin#**createParser:

 public QParser createParser(String qstr, SolrParams localParams,
 SolrParams params, SolrQueryRequest req) {
  return new LuceneQParser(qstr, localParams, params, req);
 }

 And then in org.apache.lucene.search.**Query.QParser#getQuery:

 public Query getQuery() throws SyntaxError {
  if (query==null) {
query=parse();

 And then in org.apache.lucene.search.**Query.LuceneQParser#parse:

 lparser = new SolrQueryParser(this, defaultField);

 lparser.setDefaultOperator
  (QueryParsing.**getQueryParserDefaultOperator(**getReq().getSchema(),
  getParam(QueryParsing.OP)));

 return lparser.parse(qstr);

 And then in org.apache.solr.parser.**SolrQueryParserBase#parse:

 Query res = TopLevelQuery(null);  // pass null so we can tell later if an
 explicit field was provided or not

 And then in org.apache.solr.parser.**QueryParser#TopLevelQuery, the
 parsing begins.

 And org.apache.solr.parser.**QueryParser.jj is the grammar for a basic
 Solr/Lucene query, and org.apache.solr.parser.**QueryParser.java is
 generated by JFlex, and a lot of the logic is in the base class of the
 generated class, org.apache.solr.parser.**SolrQueryParserBase.java.

 Good luck! Happy hunting!

 -- Jack Krupansky

 -Original Message- From: YouPeng Yang
 Sent: Monday, October 21, 2013 2:57 AM
 To: solr-user@lucene.apache.org
 Subject: Class name of parsing the fq clause


 Hi
   I search the solr with fq clause,which is like:
   fq=BEGINTIME:[2013-08-25T16:**00:00Z TO *] AND BUSID:(M3 OR M9)


   I am curious about the parsing process . I want to study it.
   What is the Java file name describes  the parsing  process of the fq
 clause.


  Thanks

 Regards.

RE: Facet performance

2013-10-21 Thread Lemke, Michael SZ/HZA-ZSW

On Mon, October 21, 2013 10:04 AM, Toke Eskildsen wrote:
On Fri, 2013-10-18 at 18:30 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
 Toke Eskildsen wrote:
  Unfortunately the enum-solution is normally quite slow when there
  are enough unique values to trigger the too many  values-exception.
  [...]
 
 [...] And yes, the fc method was terribly slow in a case where it did
 work.  Something like 20 minutes whereas enum returned within a few
 seconds.

Err.. What? That sounds _very_ strange. You have millions of unique
values so fc should be a lot faster than enum, not the other way around.

I assume the 20 minutes was for the first call. How fast does subsequent
calls return for fc?

QTime enum:
 1st call: 1200
 subsequent calls: 200

QTime fc:
   never returns, webserver restarts itself after 30 min with 100% CPU load


This is on the test system, the production system managed to return with
... Too many values for UnInvertedField faceting 

However, I also have different faceting queries I played with today.

One complete example:

q=ottomotorfacet.field=CONTENTfacet=truefacet.prefix=facet.limit=10facet.mincount=1facet.method=enumrows=0

These are the results, all with facet.method=enum (fc doesn't work).  They
were executed in the sequence shown on an otherwise unused server:

QTime=41205  facet.prefix=q=frequent_word  
numFound=44532

Same query repeated:
QTime=225810 facet.prefix=q=ottomotor  
numFound=909
QTime=199839 facet.prefix=q=ottomotor  
numFound=909

QTime=0  facet.prefix=q=ottomotor jkdhwjfh 
numFound=0
QTime=0  facet.prefix=q=jkdhwjfh   
numFound=0

QTime=185948 facet.prefix=q=ottomotor  
numFound=909

QTime=3344   facet.prefix=d   q=ottomotor  
numFound=909
QTime=3078   facet.prefix=d   q=ottomotor  
numFound=909
QTime=3141   facet.prefix=d   q=ottomotor  
numFound=909

The response time is obviously not dependent on the number of documents found.
Caching doesn't kick in either.



Maybe you could provide some approximate numbers?

I'll try, see below.  Thanks for asking and having a closer look.


- Documents in your index
13,434,414

- Unique values in the CONTENT field
Not sure how to get this.  In luke I find
21,797,514 term count CONTENT

Is that what you mean?

- Hits are returned from a typical query
Hm, that can be anything between 0 and 40,000 or more.
Or do you mean from the facets?  Or do my tests above
answer it?

- Xmx
The maximum the system allows me to get: 1612m


Maybe I have a hopelessly under-dimensioned server for this sort of things?

Thanks a lot for your help,
Michael

Re: Solr timeout after reboot

2013-10-21 Thread Shawn Heisey

On 10/21/2013 8:03 AM, michael.boom wrote:
 I'm using the m3.xlarge server with 15G RAM, but my index size is over 100G,
 so I guess putting running the above command would bite all available
 memory.

With a 100GB index, I would want a minimum server memory size of 64GB,
and I would much prefer 128GB.  If you shard your index, then each
machine will require less memory, because each one will have less of the
index onboard.  Running a big Solr install is usually best handled on
bare metal, because it loves RAM, and getting a lot of memory in a
virtual environment is quite expensive.  It's also expensive on bare
metal too, but unlike Amazon, more memory doesn't increase your monthly
cost.

With only 15GB total RAM and an index that big, you're probably giving
at least half of your RAM to Solr, leaving *very* little for the OS disk
cache, compared to your index size.  The ideal cache size is the same as
your index size, but you can almost always get away with less.

http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

If you try the cat trick with your numbers, it's going to take forever
every time you run it, it will kill your performance while it's
happening, and only the last few GB that it reads will remain in the OS
disk cache.  Chances are that it will be the wrong part of the index, too.

You only want to cat your entire index if you have enough free RAM to
*FIT* your entire index.  If you *DO* have that much free memory (which
for you would require a total RAM size of about 128GB), then the first
time will take quite a while, but every time you do it after that, it
will happen nearly instantly, because it will not have to actually read
the disk at all.

You could try only doing the cat on certain index files, but when you
don't have enough cache for the entire index, running queries will do a
better job of filling the cache intelligently.  The first bunch of
queries will be slow.

Summary: You need more RAM.  Quite a bit more RAM.

Thanks,
Shawn

Re: Local Solr and Webserver-Solr act differently (and treated like or)

2013-10-21 Thread Jack Krupansky

Did you completely reindex your data after emptying the stop words file?

-- Jack Krupansky

-Original Message- 
From: Stavros Delisavas

Sent: Monday, October 21, 2013 10:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Local Solr and Webserver-Solr act differently (and treated 
like or)

Okay, I emtpied the stopword file. I don't know where the wordlist came
from. I have never seen this and never touched that file. Anyways...
Now my queries do work with one word, like in or to but the queries
still do not work when I use more than one stopword within one query.
Instead of too many results I now get NO results at all.

What could be the problem?

On 17.10.2013 15:02, Jack Krupansky wrote:

The default Solr stopwords.txt file is empty, so SOMEBODY created that
non-empty stop words file.

The StopFilterFactory token filter in the field type analyzer controls
stop word processing. You can remove that step entirely, or different
field types can reference different stop word files, or some field type
analyzers can use the stop filter and some would not have it. This does
mean that you would have to use different field types for fields that
want different stop word processing.

-- Jack Krupansky

-Original Message- From: Stavros Delisavas
Sent: Thursday, October 17, 2013 3:27 AM
To: solr-user@lucene.apache.org
Subject: Re: Local Solr and Webserver-Solr act differently (and
treated like or)

Thank you,
I found the file with the stopwords and noticed that my local file is
empty (comments only) and the one on my webserver has a big list of
english stopwords. That seems to be the problem.

I think in general it is a good idea to use stopwords for random
searches, but it is not usefull in my special case. Is there a way to
(de)activate stopwords query-wise? Like I would like to ignore stopwords
when searching in titles but I would like to use stopwords when users do
a fulltext-search on whole articles, etc.

Thanks again,
Stavros

On 17.10.2013 09:13, Upayavira wrote:

Stopwords are small words such as and, the or is,that we might
choose to exclude from our documents and queries because they are such
common terms. Once you have stripped stop words from your above query,
all that is left is the word wild, or so is being suggested.

Somewhere in your config, close to solr config.xml, you will find a file
called something like stopwords.txt. Compare these files between your
two systems.

Upayavira

On Thu, Oct 17, 2013, at 07:18 AM, Stavros Delsiavas wrote:

Unfortunatly, I don't really know what stopwords are. I would like it to
not ignore any words of my query.
How/Where can I change this stopwords-behaviour?

Am 16.10.2013 23:45, schrieb Jack Krupansky:

So, the stopwords.txt file is different between the two systems - the
first has stop words but the second does not. Did you expect stop
words to be removed, or not?

-- Jack Krupansky

-Original Message- From: Stavros Delsiavas
Sent: Wednesday, October 16, 2013 5:02 PM
To: solr-user@lucene.apache.org
Subject: Re: Local Solr and Webserver-Solr act differently (and
treated like or)

Okay I understand,

here's the rawquerystring. It was at about line 3000:

lst name=debug
 str name=rawquerystringtitle:(into AND the AND wild*)/str
 str name=querystringtitle:(into AND the AND wild*)/str
 str name=parsedquery+title:wild*/str
 str name=parsedquery_toString+title:wild*/str

At this place the debug output DOES differ from the one on my local
system. But I don't understand why...
This is the local debug output:

lst name=debug
  str name=rawquerystringtitle:(into AND the AND wild*)/str
  str name=querystringtitle:(into AND the AND wild*)/str
  str name=parsedquery+title:into +title:the +title:wild*/str
  str name=parsedquery_toString+title:into +title:the
+title:wild*/str

Why is that? Any ideas?

Am 16.10.2013 21:03, schrieb Shawn Heisey:

On 10/16/2013 4:46 AM, Stavros Delisavas wrote:

My local solr gives me:
http://pastebin.com/Q6d9dFmZ

and my webserver this:
http://pastebin.com/q87WEjVA

I copied only the first few hundret lines (of more than 8000) because
the webserver output was to big even for pastebin.

On 16.10.2013 12:27, Erik Hatcher wrote:

What does the debug output say from debugQuery=true say between the
two?

What's really needed here is the first part of the debug section,
which has rawquerystring, querystring, parsedquery, and
parsedquery_toString.  The info from your local solr has this part,
but
what you pasted from the webserver one didn't include those parts,
because it's further down than the first few hundred lines.

Thanks,
Shawn

Re: Local Solr and Webserver-Solr act differently (and treated like or)

2013-10-21 Thread Stavros Delsiavas

I did a full-import again. That solved the issue. I didn't know that the 
stopwords apply on the indexing itself too.


Thanks a lot,

Stavros


Am 21.10.2013 17:13, schrieb Jack Krupansky:

Did you completely reindex your data after emptying the stop words file?

-- Jack Krupansky

-Original Message- From: Stavros Delisavas
Sent: Monday, October 21, 2013 10:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Local Solr and Webserver-Solr act differently (and 
treated like or)


Okay, I emtpied the stopword file. I don't know where the wordlist came
from. I have never seen this and never touched that file. Anyways...
Now my queries do work with one word, like in or to but the queries
still do not work when I use more than one stopword within one query.
Instead of too many results I now get NO results at all.

What could be the problem?



On 17.10.2013 15:02, Jack Krupansky wrote:

The default Solr stopwords.txt file is empty, so SOMEBODY created that
non-empty stop words file.

The StopFilterFactory token filter in the field type analyzer controls
stop word processing. You can remove that step entirely, or different
field types can reference different stop word files, or some field type
analyzers can use the stop filter and some would not have it. This does
mean that you would have to use different field types for fields that
want different stop word processing.

-- Jack Krupansky

-Original Message- From: Stavros Delisavas
Sent: Thursday, October 17, 2013 3:27 AM
To: solr-user@lucene.apache.org
Subject: Re: Local Solr and Webserver-Solr act differently (and
treated like or)

Thank you,
I found the file with the stopwords and noticed that my local file is
empty (comments only) and the one on my webserver has a big list of
english stopwords. That seems to be the problem.

I think in general it is a good idea to use stopwords for random
searches, but it is not usefull in my special case. Is there a way to
(de)activate stopwords query-wise? Like I would like to ignore stopwords
when searching in titles but I would like to use stopwords when users do
a fulltext-search on whole articles, etc.

Thanks again,
Stavros


On 17.10.2013 09:13, Upayavira wrote:

Stopwords are small words such as and, the or is,that we might
choose to exclude from our documents and queries because they are such
common terms. Once you have stripped stop words from your above query,
all that is left is the word wild, or so is being suggested.

Somewhere in your config, close to solr config.xml, you will find a 
file

called something like stopwords.txt. Compare these files between your
two systems.

Upayavira

On Thu, Oct 17, 2013, at 07:18 AM, Stavros Delsiavas wrote:
Unfortunatly, I don't really know what stopwords are. I would like 
it to

not ignore any words of my query.
How/Where can I change this stopwords-behaviour?


Am 16.10.2013 23:45, schrieb Jack Krupansky:

So, the stopwords.txt file is different between the two systems - the
first has stop words but the second does not. Did you expect stop
words to be removed, or not?

-- Jack Krupansky

-Original Message- From: Stavros Delsiavas
Sent: Wednesday, October 16, 2013 5:02 PM
To: solr-user@lucene.apache.org
Subject: Re: Local Solr and Webserver-Solr act differently (and
treated like or)

Okay I understand,

here's the rawquerystring. It was at about line 3000:

lst name=debug
 str name=rawquerystringtitle:(into AND the AND wild*)/str
 str name=querystringtitle:(into AND the AND wild*)/str
 str name=parsedquery+title:wild*/str
 str name=parsedquery_toString+title:wild*/str

At this place the debug output DOES differ from the one on my local
system. But I don't understand why...
This is the local debug output:

lst name=debug
  str name=rawquerystringtitle:(into AND the AND wild*)/str
  str name=querystringtitle:(into AND the AND wild*)/str
  str name=parsedquery+title:into +title:the +title:wild*/str
  str name=parsedquery_toString+title:into +title:the
+title:wild*/str

Why is that? Any ideas?




Am 16.10.2013 21:03, schrieb Shawn Heisey:

On 10/16/2013 4:46 AM, Stavros Delisavas wrote:

My local solr gives me:
http://pastebin.com/Q6d9dFmZ

and my webserver this:
http://pastebin.com/q87WEjVA

I copied only the first few hundret lines (of more than 8000) 
because

the webserver output was to big even for pastebin.



On 16.10.2013 12:27, Erik Hatcher wrote:
What does the debug output say from debugQuery=true say between 
the

two?

What's really needed here is the first part of the debug section,
which has rawquerystring, querystring, parsedquery, and
parsedquery_toString.  The info from your local solr has this part,
but
what you pasted from the webserver one didn't include those parts,
because it's further down than the first few hundred lines.

Thanks,
Shawn

SolrCloud performance in VM environment

2013-10-21 Thread Tom Mortimer

Hi everyone,

I've been working on an installation recently which uses SolrCloud to index
45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with another 2
identical VMs set up for replicas). The reason we're using so many shards
for a relatively small index is that there are complex filtering
requirements at search time, to restrict users to items they are licensed
to view. Initial tests demonstrated that multiple shards would be required.

The total size of the index is about 140GB, and each VM has 16GB RAM (32GB
total) and 4 CPU units. I know this is far under what would normally be
recommended for an index of this size, and I'm working on persuading the
customer to increase the RAM (basically, telling them it won't work
otherwise.) Performance is currently pretty poor and I would expect more
RAM to improve things. However, there are a couple of other oddities which
concern me,

The first is that I've been reindexing a fixed set of 500 docs to test
indexing and commit performance (with soft commits within 60s). The time
taken to complete a hard commit after this is longer than I'd expect, and
highly variable - from 10s to 70s. This makes me wonder whether the SAN
(which provides all the storage for these VMs and the customers several
other VMs) is being saturated periodically. I grabbed some iostat output on
different occasions to (possibly) show the variability:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb  64.50 0.00  2476.00  0   4952
...
sdb   8.90 0.00   348.00  0   6960
...
sdb   1.15 0.0043.20  0864

The other thing that confuses me is that after a Solr restart or hard
commit, search times average about 1.2s under light load. After searching
the same set of queries for 5-6 iterations this improves to 0.1s. However,
in either case - cold or warm - iostat reports no device reads at all:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb   0.40 0.00 8.00  0160
...
sdb   0.30 0.0010.40  0104

(the writes are due to logging). This implies to me that the 'hot' blocks
are being completely cached in RAM - so why the variation in search time
and the number of iterations required to speed it up?

The Solr caches are only being used lightly by these tests and there are no
evictions. GC is not a significant overhead. Each Solr shard runs in a
separate JVM with 1GB heap.

I don't have a great deal of experience in low-level performance tuning, so
please forgive any naivety. Any ideas of what to do next would be greatly
appreciated. I don't currently have details of the VM implementation but
can get hold of this if it's relevant.

thanks,
Tom

RE: SolrCloud performance in VM environment

2013-10-21 Thread Boogie Shafer

some basic tips.

-try to create enough shards that you can get the size of each index portion on 
the shard closer to the amount of RAM you have on each node (e.g. if you are 
~140GB index on 16GB nodes, try doing 12-16 shards)

-start with just the initial shards, add replicas later when you have dialed 
things in a bit more

-try to leave some memory for the OS as well as the JVM

-try starting with 1/2 of the total ram on each vm allocated to JVM as Xmx value

-try setting the Xms in the range of .75 to 1.0 of Xmx

-do all the normal JVM tuning, esp the part about capturing the gc events in a 
log such that you can see what is going on with java itself..this will probably 
lead you to adjust your GC type, etc

-make sure you arent hammering your storage devices (or the interconnects 
between your servers and your storage)...the OS internal tools on the guest are 
helpful, but you probably want to look at the hypervisor and storage device 
layer directly as well. on vmware the built in perf graphs for datastore 
latency and network throughput are easily observed. esxtop is the cli tool 
which provides the same.

-if you are using a SAN, you probably want to make sure you have some sort of 
MPIO in place (esp if you are using 1GB iscsi)





From: Tom Mortimer tom.m.f...@gmail.com
Sent: Monday, October 21, 2013 08:48
To: solr-user@lucene.apache.org
Subject: SolrCloud performance in VM environment

Hi everyone,

I've been working on an installation recently which uses SolrCloud to index
45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with another 2
identical VMs set up for replicas). The reason we're using so many shards
for a relatively small index is that there are complex filtering
requirements at search time, to restrict users to items they are licensed
to view. Initial tests demonstrated that multiple shards would be required.

The total size of the index is about 140GB, and each VM has 16GB RAM (32GB
total) and 4 CPU units. I know this is far under what would normally be
recommended for an index of this size, and I'm working on persuading the
customer to increase the RAM (basically, telling them it won't work
otherwise.) Performance is currently pretty poor and I would expect more
RAM to improve things. However, there are a couple of other oddities which
concern me,

The first is that I've been reindexing a fixed set of 500 docs to test
indexing and commit performance (with soft commits within 60s). The time
taken to complete a hard commit after this is longer than I'd expect, and
highly variable - from 10s to 70s. This makes me wonder whether the SAN
(which provides all the storage for these VMs and the customers several
other VMs) is being saturated periodically. I grabbed some iostat output on
different occasions to (possibly) show the variability:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb  64.50 0.00  2476.00  0   4952
...
sdb   8.90 0.00   348.00  0   6960
...
sdb   1.15 0.0043.20  0864

The other thing that confuses me is that after a Solr restart or hard
commit, search times average about 1.2s under light load. After searching
the same set of queries for 5-6 iterations this improves to 0.1s. However,
in either case - cold or warm - iostat reports no device reads at all:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb   0.40 0.00 8.00  0160
...
sdb   0.30 0.0010.40  0104

(the writes are due to logging). This implies to me that the 'hot' blocks
are being completely cached in RAM - so why the variation in search time
and the number of iterations required to speed it up?

The Solr caches are only being used lightly by these tests and there are no
evictions. GC is not a significant overhead. Each Solr shard runs in a
separate JVM with 1GB heap.

I don't have a great deal of experience in low-level performance tuning, so
please forgive any naivety. Any ideas of what to do next would be greatly
appreciated. I don't currently have details of the VM implementation but
can get hold of this if it's relevant.

thanks,
Tom

Re: Question about docvalues

2013-10-21 Thread Erick Erickson

I really don't understand the question. What behavior are you seeing
that leads you to ask?

bq: Is it necessary duplicate the field and set index and stored to false
and
If this means setting _both_ indexed and stored to false, then you
effectively
throw the field completely away, there's no point in doing this.

FWIW,
Erick


On Mon, Oct 21, 2013 at 1:39 PM, yriveiro yago.rive...@gmail.com wrote:

 Hi,

 If I have a field (named dv_field) configured to be indexed, stored and
 with
 docvalues=true.

 How I know that when I do a query like:

 q=*:*facet=truefacet.field=dv_field, I'm really using the docvalues and
 not the normal way?

 Is it necessary duplicate the field and set index and stored to false and
 let the docvalues property set to true?



 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Question-about-docvalues-tp4096802.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Pivot faceting not working after upgrading to 4.5

2013-10-21 Thread Henrik Ossipoff Hansen

Hello,

We have a rather weird behavior I don't really understand. As written in a few 
other threads, we're migrating from a master/slave setup running 4.3 to a 
SolrCloud setup running 4.5. Both run on the same data set (the 4.5 instances 
have been re-indexed under 4.5 obviously).

The following query works fine under our 4.3 setup:

?q=*:*facet.pivot=facet_category,facet_platformfacet=truerows=0

However, in our 4.5 setup, the facet_pivot entry in the facet_count is straight 
up missing in the response. I've been digging around the logs for a bit, but 
I'm unable to find something relating to this. If I remove one of the 
facet.pivot elements (i.e. only having facet.pivot=facet_category) I get an 
error as expected, so that part of the component is at least working.

Does anyone have an idea to something obvious I might have missed? I've been 
unable to find any change logs suggesting changes to this part of the facet 
component.

Thanks.

Regards,
Henrik

Re: Pivot faceting not working after upgrading to 4.5

2013-10-21 Thread Henrik Ossipoff Hansen

I realise now that distributed pivotal faceting is not implemented yet in 
SolrCloud after some digging through the internet.

Apologies :)

Den 21/10/2013 kl. 18.20 skrev Henrik Ossipoff Hansen 
h...@entertainment-trading.com:

 Hello,
 
 We have a rather weird behavior I don't really understand. As written in a 
 few other threads, we're migrating from a master/slave setup running 4.3 to a 
 SolrCloud setup running 4.5. Both run on the same data set (the 4.5 instances 
 have been re-indexed under 4.5 obviously).
 
 The following query works fine under our 4.3 setup:
 
 ?q=*:*facet.pivot=facet_category,facet_platformfacet=truerows=0
 
 However, in our 4.5 setup, the facet_pivot entry in the facet_count is 
 straight up missing in the response. I've been digging around the logs for a 
 bit, but I'm unable to find something relating to this. If I remove one of 
 the facet.pivot elements (i.e. only having facet.pivot=facet_category) I get 
 an error as expected, so that part of the component is at least working.
 
 Does anyone have an idea to something obvious I might have missed? I've been 
 unable to find any change logs suggesting changes to this part of the facet 
 component.
 
 Thanks.
 
 Regards,
 Henrik

Re: Question about docvalues

2013-10-21 Thread Yago Riveiro

Sorry if I don't make understand, my english is not too good.

My goal is remove pressure from the heap, my indexes are too big and the heap 
get full very quick and I get an OOM. I read about docValues stored on disk, 
but I don't know how configure it.

A read this link: 
https://cwiki.apache.org/confluence/display/solr/DocValues#DocValues-HowtoUseDocValues
 witch has an example that how to configure a field to use docValues:

field name=manu_exact type=string indexed=false stored=false 
docValues=true /

With this configuration is obvious that I will use docValues.

Q: With this configuration, can I retrieve the field value on a normal search 
or still need to be stored?

If I have a field configured as:

field name=manu_exact type=string indexed=true stored=true 
docValues=true /

And I do a facet query on manu_exact field: 
q=*:*facet=truefacet.field=manu_exact

Q: I leverage the docValues feature?, This means, docValues always has 
precedency if is set over the regular method to do a facet?
Q: Make sense the field indexed if I have docValues?


-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, October 21, 2013 at 5:10 PM, Erick Erickson wrote:

 I really don't understand the question. What behavior are you seeing
 that leads you to ask?
 
 bq: Is it necessary duplicate the field and set index and stored to false
 and
 If this means setting _both_ indexed and stored to false, then you
 effectively
 throw the field completely away, there's no point in doing this.
 
 FWIW,
 Erick
 
 
 On Mon, Oct 21, 2013 at 1:39 PM, yriveiro yago.rive...@gmail.com 
 (mailto:yago.rive...@gmail.com) wrote:
 
  Hi,
  
  If I have a field (named dv_field) configured to be indexed, stored and
  with
  docvalues=true.
  
  How I know that when I do a query like:
  
  q=*:*facet=truefacet.field=dv_field, I'm really using the docvalues and
  not the normal way?
  
  Is it necessary duplicate the field and set index and stored to false and
  let the docvalues property set to true?
  
  
  
  -
  Best regards
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Question-about-docvalues-tp4096802.html
  Sent from the Solr - User mailing list archive at Nabble.com 
  (http://Nabble.com).

Re: Question about docvalues

2013-10-21 Thread Gun Akkor

Hello Yago,

To my knowledge, in facet calculations docValues take precedence over other 
methods. So, even if your field is also stored and indexed, your facets won't 
use the inverted index or fieldValueCache, when docValues are present.

I think you will still have to store and index to maintain your other 
functionality. DocValues are helpful only for facets and sorting to my 
knowledge.

Hope this helps,

Gun Akkor
www.carbonblack.com
Sent from my iPhone

On Oct 21, 2013, at 12:41 PM, Yago Riveiro yago.rive...@gmail.com wrote:

 Sorry if I don't make understand, my english is not too good.
 
 My goal is remove pressure from the heap, my indexes are too big and the heap 
 get full very quick and I get an OOM. I read about docValues stored on disk, 
 but I don't know how configure it.
 
 A read this link: 
 https://cwiki.apache.org/confluence/display/solr/DocValues#DocValues-HowtoUseDocValues
  witch has an example that how to configure a field to use docValues:
 
 field name=manu_exact type=string indexed=false stored=false 
 docValues=true /
 
 With this configuration is obvious that I will use docValues.
 
 Q: With this configuration, can I retrieve the field value on a normal search 
 or still need to be stored?
 
 If I have a field configured as:
 
 field name=manu_exact type=string indexed=true stored=true 
 docValues=true /
 
 And I do a facet query on manu_exact field: 
 q=*:*facet=truefacet.field=manu_exact
 
 Q: I leverage the docValues feature?, This means, docValues always has 
 precedency if is set over the regular method to do a facet?
 Q: Make sense the field indexed if I have docValues?
 
 
 -- 
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
 On Monday, October 21, 2013 at 5:10 PM, Erick Erickson wrote:
 
 I really don't understand the question. What behavior are you seeing
 that leads you to ask?
 
 bq: Is it necessary duplicate the field and set index and stored to false
 and
 If this means setting _both_ indexed and stored to false, then you
 effectively
 throw the field completely away, there's no point in doing this.
 
 FWIW,
 Erick
 
 
 On Mon, Oct 21, 2013 at 1:39 PM, yriveiro yago.rive...@gmail.com 
 (mailto:yago.rive...@gmail.com) wrote:
 
 Hi,
 
 If I have a field (named dv_field) configured to be indexed, stored and
 with
 docvalues=true.
 
 How I know that when I do a query like:
 
 q=*:*facet=truefacet.field=dv_field, I'm really using the docvalues and
 not the normal way?
 
 Is it necessary duplicate the field and set index and stored to false and
 let the docvalues property set to true?
 
 
 
 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Question-about-docvalues-tp4096802.html
 Sent from the Solr - User mailing list archive at Nabble.com 
 (http://Nabble.com).

Re: Exact Match Results

2013-10-21 Thread Developer

You need to provide us with the fieldtype information..

If you just want to match the phrase entered by user, you can use
KeywordTokenizerFactory..

Reference:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Creates org.apache.lucene.analysis.core.KeywordTokenizer.

Treats the entire field as a single token, regardless of its content.

Example: http://example.com/I-am+example?Text=-Hello; ==
http://example.com/I-am+example?Text=-Hello;



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-Match-Results-tp4096816p4096846.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about docvalues

2013-10-21 Thread Yago Riveiro

Hi Gun,

Thanks for the response.

Indeed I only want docValues to do facets.

IMHO I think that a reference to the fact that docValues take precedence over 
other methods is needed. Is not always obvious.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, October 21, 2013 at 5:53 PM, Gun Akkor wrote:

 Hello Yago,
 
 To my knowledge, in facet calculations docValues take precedence over other 
 methods. So, even if your field is also stored and indexed, your facets won't 
 use the inverted index or fieldValueCache, when docValues are present.
 
 I think you will still have to store and index to maintain your other 
 functionality. DocValues are helpful only for facets and sorting to my 
 knowledge.
 
 Hope this helps,
 
 Gun Akkor
 www.carbonblack.com (http://www.carbonblack.com)
 Sent from my iPhone
 
 On Oct 21, 2013, at 12:41 PM, Yago Riveiro yago.rive...@gmail.com 
 (mailto:yago.rive...@gmail.com) wrote:
 
  Sorry if I don't make understand, my english is not too good.
  
  My goal is remove pressure from the heap, my indexes are too big and the 
  heap get full very quick and I get an OOM. I read about docValues stored on 
  disk, but I don't know how configure it.
  
  A read this link: 
  https://cwiki.apache.org/confluence/display/solr/DocValues#DocValues-HowtoUseDocValues
   witch has an example that how to configure a field to use docValues:
  
  field name=manu_exact type=string indexed=false stored=false 
  docValues=true /
  
  With this configuration is obvious that I will use docValues.
  
  Q: With this configuration, can I retrieve the field value on a normal 
  search or still need to be stored?
  
  If I have a field configured as:
  
  field name=manu_exact type=string indexed=true stored=true 
  docValues=true /
  
  And I do a facet query on manu_exact field: 
  q=*:*facet=truefacet.field=manu_exact
  
  Q: I leverage the docValues feature?, This means, docValues always has 
  precedency if is set over the regular method to do a facet?
  Q: Make sense the field indexed if I have docValues?
  
  
  -- 
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
  
  
  On Monday, October 21, 2013 at 5:10 PM, Erick Erickson wrote:
  
   I really don't understand the question. What behavior are you seeing
   that leads you to ask?
   
   bq: Is it necessary duplicate the field and set index and stored to false
   and
   If this means setting _both_ indexed and stored to false, then you
   effectively
   throw the field completely away, there's no point in doing this.
   
   FWIW,
   Erick
   
   
   On Mon, Oct 21, 2013 at 1:39 PM, yriveiro yago.rive...@gmail.com 
   (mailto:yago.rive...@gmail.com) wrote:
   
Hi,

If I have a field (named dv_field) configured to be indexed, stored and
with
docvalues=true.

How I know that when I do a query like:

q=*:*facet=truefacet.field=dv_field, I'm really using the docvalues 
and
not the normal way?

Is it necessary duplicate the field and set index and stored to false 
and
let the docvalues property set to true?



-
Best regards
--
View this message in context:
http://lucene.472066.n3.nabble.com/Question-about-docvalues-tp4096802.html
Sent from the Solr - User mailing list archive at Nabble.com 
(http://Nabble.com).

Re: Exact Match Results

2013-10-21 Thread kumar

Hi i am using field type configuration in the following way.

field name=fsw_title type=text_full_startwith_match indexed=true
stored=false multiValued=true omitNorms=true
omitTermFreqAndPositions=true /


fieldType name=text_full_startwith_match class=solr.TextField
  analyzer type=index
  charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.PatternReplaceFilterFactory pattern=([\.,;:-_])
replacement=  replace=all/
  filter class=solr.EdgeNGramFilterFactory maxGramSize=30
minGramSize=1/
  filter class=solr.PatternReplaceFilterFactory
pattern=([^\w\d\*æøåÆØÅ ]) replacement= replace=all/
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
   /analyzer

   analyzer type=query
   charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.PatternReplaceFilterFactory
pattern=([\.,;:-_]) replacement=  replace=all/
   filter class=solr.PatternReplaceFilterFactory
pattern=([^\w\d\*æøåÆØÅ ]) replacement= replace=all/
   filter class=solr.PatternReplaceFilterFactory
pattern=^(.{30})(.*)? replacement=$1 replace=all/
filter class=solr.SynonymFilterFactory ignoreCase=true
synonyms=synonyms_fsw.txt expand=true /
 filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt /
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 /fieldType




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-Match-Results-tp4096816p4096847.html
Sent from the Solr - User mailing list archive at Nabble.com.

Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

2013-10-21 Thread JT

Does anyone have a good link to a guide / tutorial /etc. for writing a
custom function query in Solr 4?

The tutorials I've seen vary from showing half the code to being written
for older versions of Solr.


Any type of pointers would be appreciated, thanks.

Re: Solr timeout after reboot

2013-10-21 Thread Otis Gospodnetic

Hi Michael,

I agree with Shawn, don't listen to Peter ;)  but only this once -
he's a smart guy, as you can see in list archives.
And I disagree with Shawn. again, only just this once and only
somewhat. :)  Because:

In general, Shawn's advice is correct, but we have no way of knowing
your particular details.  TO illustrate the point, let me use an
extreme case where you have just one query that you hammer your
servers with.  Your Solr caches will be well utilized and your servers
will not really need a lot of memory to cache your 100 GB index
because only a small portion of it will ever be accessed.  Of course,
this is an extreme case and not realistic, but I think it helps one
understands how as the number of distinct queries grows (and thus also
the number of distinct documents being matched and returned), the need
for more and more memory goes up.  So the question is where exactly
your particular application falls.

You mentioned stress testing.  Just like you, I am assuming, have a
real index there, you need to have your real queries, too - real
volume, real diversity, real rate, real complexity, real or as close
to real everything.

Since you as using SPM, you should be able to go to various graphs in
SPM and look for a little ambulance icon above each graph.  Use that
to assemble a message with N graphs you want us to look at and we'll
be able to help more.  Graphs that may be of interest here are your
Solr cache graphs, disk IO, and memory graphs -- taken during your
realistic stress testing, of course.  You can then send that message
directly to solr-user, assuming your SPM account email address is
subscribed to the list.  Or you can paste it into a new email, up to
you.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/



On Mon, Oct 21, 2013 at 11:07 AM, Shawn Heisey s...@elyograg.org wrote:
 On 10/21/2013 8:03 AM, michael.boom wrote:
 I'm using the m3.xlarge server with 15G RAM, but my index size is over 100G,
 so I guess putting running the above command would bite all available
 memory.

 With a 100GB index, I would want a minimum server memory size of 64GB,
 and I would much prefer 128GB.  If you shard your index, then each
 machine will require less memory, because each one will have less of the
 index onboard.  Running a big Solr install is usually best handled on
 bare metal, because it loves RAM, and getting a lot of memory in a
 virtual environment is quite expensive.  It's also expensive on bare
 metal too, but unlike Amazon, more memory doesn't increase your monthly
 cost.

 With only 15GB total RAM and an index that big, you're probably giving
 at least half of your RAM to Solr, leaving *very* little for the OS disk
 cache, compared to your index size.  The ideal cache size is the same as
 your index size, but you can almost always get away with less.

 http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

 If you try the cat trick with your numbers, it's going to take forever
 every time you run it, it will kill your performance while it's
 happening, and only the last few GB that it reads will remain in the OS
 disk cache.  Chances are that it will be the wrong part of the index, too.

 You only want to cat your entire index if you have enough free RAM to
 *FIT* your entire index.  If you *DO* have that much free memory (which
 for you would require a total RAM size of about 128GB), then the first
 time will take quite a while, but every time you do it after that, it
 will happen nearly instantly, because it will not have to actually read
 the disk at all.

 You could try only doing the cat on certain index files, but when you
 don't have enough cache for the entire index, running queries will do a
 better job of filling the cache intelligently.  The first bunch of
 queries will be slow.

 Summary: You need more RAM.  Quite a bit more RAM.

 Thanks,
 Shawn

Re: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

2013-10-21 Thread Jack Krupansky

Take a look at the unit tests for various value sources, and find a Jira 
that added some value source and look at the patch for what changes had to 
be made.


-- Jack Krupansky

-Original Message- 
From: JT

Sent: Monday, October 21, 2013 1:17 PM
To: solr-user@lucene.apache.org
Subject: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

Does anyone have a good link to a guide / tutorial /etc. for writing a
custom function query in Solr 4?

The tutorials I've seen vary from showing half the code to being written
for older versions of Solr.


Any type of pointers would be appreciated, thanks.

Re: SolrCloud performance in VM environment

2013-10-21 Thread Shawn Heisey


On 10/21/2013 9:48 AM, Tom Mortimer wrote:

Hi everyone,

I've been working on an installation recently which uses SolrCloud to index
45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with another 2
identical VMs set up for replicas). The reason we're using so many shards
for a relatively small index is that there are complex filtering
requirements at search time, to restrict users to items they are licensed
to view. Initial tests demonstrated that multiple shards would be required.

The total size of the index is about 140GB, and each VM has 16GB RAM (32GB
total) and 4 CPU units. I know this is far under what would normally be
recommended for an index of this size, and I'm working on persuading the
customer to increase the RAM (basically, telling them it won't work
otherwise.) Performance is currently pretty poor and I would expect more
RAM to improve things. However, there are a couple of other oddities which
concern me,


Running multiple shards like you are, where each operating system is 
handling more than one shard, is only going to perform better if your 
query volume is low and you have lots of CPU cores.  If your query 
volume is high or you only have 2-4 CPU cores on each VM, you might be 
better off with fewer shards or not sharded at all.


The way that I read this is that you've got two physical machines with 
32GB RAM, each running two VMs that have 16GB.  Each VM houses 4 shards, 
or 70GB of index.


There's a scenario that might be better if all of the following are 
true: 1) I'm right about how your hardware is provisioned.  2) You or 
the client owns the hardware.  3) You have an extremely low-end third 
machine available - single CPU with 1GB of RAM would probably be 
enough.  In this scenario, you run one Solr instance and one zookeeper 
instance on each of your two big machines, and use the third wimpy 
machine as a third zookeeper node.  No virtualization.  For the rest of 
my reply, I'm assuming that you haven't taken this step, but it will 
probably apply either way.



The first is that I've been reindexing a fixed set of 500 docs to test
indexing and commit performance (with soft commits within 60s). The time
taken to complete a hard commit after this is longer than I'd expect, and
highly variable - from 10s to 70s. This makes me wonder whether the SAN
(which provides all the storage for these VMs and the customers several
other VMs) is being saturated periodically. I grabbed some iostat output on
different occasions to (possibly) show the variability:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb  64.50 0.00  2476.00  0   4952
...
sdb   8.90 0.00   348.00  0   6960
...
sdb   1.15 0.0043.20  0864


There are two likely possibilities for this.  One or both of them might 
be in play.  1) Because the OS disk cache is small, not much of the 
index can be cached.  This can result in a lot of disk I/O for a commit, 
slowing things way down.  Increasing the size of the OS disk cache is 
really the only solution for that. 2) Cache autowarming, particularly 
the filter cache.  In the cache statistics, you can see how long each 
cache took to warm up after the last searcher was opened.  The solution 
for that is to reduce the autowarmCount values.



The other thing that confuses me is that after a Solr restart or hard
commit, search times average about 1.2s under light load. After searching
the same set of queries for 5-6 iterations this improves to 0.1s. However,
in either case - cold or warm - iostat reports no device reads at all:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb   0.40 0.00 8.00  0160
...
sdb   0.30 0.0010.40  0104

(the writes are due to logging). This implies to me that the 'hot' blocks
are being completely cached in RAM - so why the variation in search time
and the number of iterations required to speed it up?


Linux is pretty good about making limited OS disk cache resources work.  
Sounds like the caching is working reasonably well for queries.  It 
might not be working so well for updates or commits, though.


Running multiple Solr JVMs per machine, virtual or not, causes more 
problems than it solves.  Solr has no limits on the number of cores 
(shard replicas) per instance, assuming there are enough system 
resources.  There should be exactly one Solr JVM per operating system.  
Running more than one results in quite a lot of overhead, and your 
memory is precious.  When you create a collection, you can give the 
collections API the maxShardsPerNode parameter to create more than one 
shard per instance.



I don't have a great deal of experience in low-level performance tuning, so
please forgive any naivety. Any ideas of what to do next would be greatly
appreciated. I don't currently have

Re: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

2013-10-21 Thread fudong li

Hi Jack,

Do you have a date for the new version of your book:
solr_4x_deep_dive_early_access?

Thanks,

Fudong


On Mon, Oct 21, 2013 at 10:39 AM, Jack Krupansky j...@basetechnology.comwrote:

 Take a look at the unit tests for various value sources, and find a Jira
 that added some value source and look at the patch for what changes had to
 be made.

 -- Jack Krupansky

 -Original Message- From: JT
 Sent: Monday, October 21, 2013 1:17 PM
 To: solr-user@lucene.apache.org
 Subject: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?


 Does anyone have a good link to a guide / tutorial /etc. for writing a
 custom function query in Solr 4?

 The tutorials I've seen vary from showing half the code to being written
 for older versions of Solr.


 Any type of pointers would be appreciated, thanks.

Re: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

2013-10-21 Thread Jack Krupansky

Hopefully at the end of the week.

-- Jack Krupansky

-Original Message- 
From: fudong li

Sent: Monday, October 21, 2013 1:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

Hi Jack,

Do you have a date for the new version of your book:
solr_4x_deep_dive_early_access?

Thanks,

Fudong

On Mon, Oct 21, 2013 at 10:39 AM, Jack Krupansky 
j...@basetechnology.comwrote:

Take a look at the unit tests for various value sources, and find a Jira
that added some value source and look at the patch for what changes had to
be made.

-- Jack Krupansky

-Original Message- From: JT
Sent: Monday, October 21, 2013 1:17 PM
To: solr-user@lucene.apache.org
Subject: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

Does anyone have a good link to a guide / tutorial /etc. for writing a
custom function query in Solr 4?

The tutorials I've seen vary from showing half the code to being written
for older versions of Solr.

Any type of pointers would be appreciated, thanks.

reindexing data

2013-10-21 Thread Christopher Gross

In Solr 4.5, I'm trying to create a new collection on the fly.  I have a
data dir with the index that should be in there, but the CREATE command
makes the directory be:
collection name_shard1_replicant#

I was hoping that making a collection named something would use a directory
with that name to let me use the data that I already have to fill the
collection.  I could go and just make each one
(name_shard_replicant[1,2,3]), but I was hoping there may be an easier
way of doing this.

Sorry if this is confusing (it is Monday), I can try clarify if needed.
Thanks.

-- Chris

Re: Questions developing custom functionquery

2013-10-21 Thread JT

I would agree the right way to do this is probably just add the
information I wish to sort on directly, as a date field or something like
that.

The issue is we currently have ~300m documents that are already indexed.
Not all of the fields have stored=true (for good reason, we maintain the
documents externally, about 7TB worth. I didn't want to replicate 7TB of
data twice.) so we cannot update these indexed values.

I was hoping to spend 2-3 days writing a custom query to avoid 2+ months of
indexing everything all over again.

So let me just ask this question, given my current situation, lets say you
had the following field

str name=resourcename/path/to/file/month/day/year/file.txt/str

I simply want to extract the month/day/year and sort based on that.

My current plan was to convert the month, day, year into seconds from right
now, and return that number. Thus sorting ascending, it should return
newest documents first.

-JT

On Fri, Oct 18, 2013 at 3:14 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:

: Field-Type: org.apache.solr.schema.TextField
...
: DocTermsIndexDocValues
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-queries/4.3.0/org/apache/lucene/queries/function/docvalues/DocTermsIndexDocValues.java#DocTermsIndexDocValues
.
: Calling getVal() on a DocTermsIndexDocValues does some really weird
stuff
: that I really don't understand.

Your TextField is being analyzed in some way you haven't clarified, and
the DocTermsIndexDocValues you get contains the details of each term in
that TextField

: Its possible I'm going about this wrong and need to re-do my approach.
I'm
: just currently at a loss for what that approach is.

Based on your initial goal, you are most certainly going about this in a
much more complicated way then you need to...

:My goal is to be able to implement a custom sorting technique.

:Example: str name=resname/some
:example/data/here/2013/09/12/testing.text/str
:
:I would like to do a custom sort based on this resname field.
:Basically, I would like to parse out that date there (2013/09/12)
and
: sort
:on that date.

You are going to be *MUCH* happier (both in terms of effort, and in terms
of performance) if instead of writing a custom function to parse strings
at query time when sorting, you implement the parsing logic when indexing
the doc and index it up front as a date field that you can sort on.

I would suggest something like CloneFieldUpdateProcessorFactory +
RegexReplaceProcessorFactory could save you the work of needing to
implement any custom logic -- but as Jack pointed out in SOLR-4864 it
doesn't currently allow you to do capture group replacements (but maybe
you could contribute a patch to fix that instead of needing to write
completely custom code for yourself)

Of maybe, as is, you could use RegexReplaceProcessorFactory to throw away
non digits - and then use ParseDateFieldUpdateProcessorFactory to get what
you want? (I'm not certain - i haven't played with
ParseDateFieldUpdateProcessorFactory much)

https://issues.apache.org/jira/browse/SOLR-4864

https://lucene.apache.org/solr/4_5_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html

https://lucene.apache.org/solr/4_5_0/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html

https://lucene.apache.org/solr/4_5_0/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html

-Hoss

How to extract a field with a prefixed dimension?

2013-10-21 Thread javozzo

Hi,
i'm new in solr.
i use the content field to extract the text of solr documents, but this
field is too long. 
Is there a way to extract only a substring of this field?
i make my query in java as follow:

SolrQuery querySolr = new SolrQuery();
querySolr.setQuery(*:*);
querySolr.setRows(3);
querySolr.setParam(wt, json);
querySolr.addField(content);
querySolr.addField(title);
querySolr.addField(url);

any ideas?
Thanks
Danilo



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-extract-a-field-with-a-prefixed-dimension-tp4096877.html
Sent from the Solr - User mailing list archive at Nabble.com.

External Zookeeper and JBOSS

2013-10-21 Thread Branham, Jeremy [HR]

When I use the Zookeeper CLI utility, I'm not sure if the configuration is 
uploading correctly.
How can I tell?

This is the command I am issuing -
./zkCli.sh -cmd upconfig -server 127.0.0.1:2181 -confdir 
/data/v8p/solr/root/conf -confname defaultconfig -solrhome /data/v8p/solr

Then checking with this -
[zk: localhost:2181(CONNECTED) 0] ls /
[aliases.json, live_nodes, overseer, overseer_elect, collections, zookeeper, 
clusterstate.json]


But I don't see any config node.

One thing to note - I have multiple cores but the configs are located in a 
common dir.
Maybe that is causing a problem.

Sorl.xml [simplified by removing additional cores]

?xml version=1.0 encoding=UTF-8 ?
solr persistent=true sharedLib=lib zkHost=192.168.1.101:2181
  cores adminPath=/admin/cores
core schema=/data/v8p/solr/root/schema/schema.xml 
instanceDir=/data/v8p/solr/root/ name=wdsp 
dataDir=/data/v8p/solr/wdsp2/data/
core schema=/data/v8p/solr/root/schema/schema.xml 
instanceDir=/data/v8p/solr/root/ name=wdsp2 
dataDir=/data/v8p/solr/wdsp/data/
  /cores
/solr


Am I overlooking something obvious?

Thanks!



Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
http://JeremyBranham.Wordpress.comhttp://jeremybranham.wordpress.com/
http://www.linkedin.com/in/jeremybranham




This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.

RE: External Zookeeper and JBOSS

2013-10-21 Thread Branham, Jeremy [HR]


I've made progress...

Rather than using the zkCli.sh in the zookeep bin folder, I used the java libs 
fom SOLR and the config now shows up.




Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
http://JeremyBranham.Wordpress.com
http://www.linkedin.com/in/jeremybranham


-Original Message-
From: Branham, Jeremy [HR]
Sent: Monday, October 21, 2013 2:20 PM
To: SOLR User distro (solr-user@lucene.apache.org)
Subject: External Zookeeper and JBOSS

When I use the Zookeeper CLI utility, I'm not sure if the configuration is 
uploading correctly.
How can I tell?

This is the command I am issuing -
./zkCli.sh -cmd upconfig -server 127.0.0.1:2181 -confdir 
/data/v8p/solr/root/conf -confname defaultconfig -solrhome /data/v8p/solr

Then checking with this -
[zk: localhost:2181(CONNECTED) 0] ls /
[aliases.json, live_nodes, overseer, overseer_elect, collections, zookeeper, 
clusterstate.json]


But I don't see any config node.

One thing to note - I have multiple cores but the configs are located in a 
common dir.
Maybe that is causing a problem.

Sorl.xml [simplified by removing additional cores]

?xml version=1.0 encoding=UTF-8 ?
solr persistent=true sharedLib=lib zkHost=192.168.1.101:2181
  cores adminPath=/admin/cores
core schema=/data/v8p/solr/root/schema/schema.xml 
instanceDir=/data/v8p/solr/root/ name=wdsp 
dataDir=/data/v8p/solr/wdsp2/data/
core schema=/data/v8p/solr/root/schema/schema.xml 
instanceDir=/data/v8p/solr/root/ name=wdsp2 
dataDir=/data/v8p/solr/wdsp/data/
  /cores
/solr


Am I overlooking something obvious?

Thanks!



Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
http://JeremyBranham.Wordpress.comhttp://jeremybranham.wordpress.com/
http://www.linkedin.com/in/jeremybranham




This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.



This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.

Re: External Zookeeper and JBOSS

2013-10-21 Thread Shawn Heisey


On 10/21/2013 1:19 PM, Branham, Jeremy [HR] wrote:

Sorl.xml [simplified by removing additional cores]

?xml version=1.0 encoding=UTF-8 ?
solr persistent=true sharedLib=lib zkHost=192.168.1.101:2181
   cores adminPath=/admin/cores
 core schema=/data/v8p/solr/root/schema/schema.xml instanceDir=/data/v8p/solr/root/ 
name=wdsp dataDir=/data/v8p/solr/wdsp2/data/
 core schema=/data/v8p/solr/root/schema/schema.xml instanceDir=/data/v8p/solr/root/ 
name=wdsp2 dataDir=/data/v8p/solr/wdsp/data/
   /cores
/solr


These cores that you have listed here do not look like SolrCloud-related 
cores, because they do not reference a collection or a shard.  Here's 
what I've got on a 4.2.1 box where all cores were automatically created 
by the CREATE action on the collections API:


core schema=schema.xml loadOnStartup=true shard=shard1 
instanceDir=eatatjoes_shard1_replica2/ transient=false 
name=eatatjoes_shard1_replica2 config=solrconfig.xml 
collection=eatatjoes/
core schema=schema.xml loadOnStartup=true shard=shard1 
instanceDir=test3_shard1_replica1/ transient=false 
name=test3_shard1_replica1 config=solrconfig.xml collection=test3/
core schema=schema.xml loadOnStartup=true shard=shard1 
instanceDir=smb2_shard1_replica1/ transient=false 
name=smb2_shard1_replica1 config=solrconfig.xml collection=smb2/


On the commandline script -- the zkCli.sh script comes with zookeeper, 
but it is not aware of anything having to do with SolrCloud.  There is 
another script named zkcli.sh (note the lowercase C) that comes with the 
solr example (in example/cloud-scripts)- it's a very different script 
and will accept the options that you tried to give.


I do wonder how much pain would be caused by renaming the Solr zkcli 
script so it's not so similar to the one that comes with Zookeeper.


Thanks,
Shawn

Major GC does not reduce the old gen size

2013-10-21 Thread neoman

Hello everyone,
We are using solr 4.4 version production with 4 shards. This is our memory
settings.
-d64 -server -Xms8192m -Xmx12288m -XX:MaxPermSize=256m \
-XX:NewRatio=1 -XX:SurvivorRatio=6 \
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
-XX:CMSIncrementalDutyCycleMin=0 \
-XX:CMSIncrementalDutyCycle=10 -XX:+CMSIncrementalPacing \
-XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC \
-XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC \
-XX:+UseLargePages \
-XX:+UseParNewGC \
-XX:ConcGCThreads=10 \
-XX:ParallelGCThreads=10 \
-XX:MaxGCPauseMillis=3 \
I notice in production that, the old generation becomes full and no amount
of garbage collection will free up the memory
This is similar to the issue discussed in this link. 
http://grokbase.com/t/lucene/solr-user/12bwydq5jr/permanently-full-old-generation
Did anyone have this problem? Can you please point anything wrong with the
GC configuration?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Major-GC-does-not-reduce-the-old-gen-size-tp4096880.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: measure result set quality

2013-10-21 Thread Alvaro Cabrerizo

Thanks for your valuable answers.

As a first approach I will evaluate (manually :( ) hits that are out of the
intersection set for every query in each system. Anyway I will keep
searching for literature in the field.

Regards.


On Sun, Oct 20, 2013 at 10:55 PM, Doug Turnbull 
dturnb...@opensourceconnections.com wrote:

 That's exactly what we advocate for in our Solr work. We call in Test
 Driven Relevancy. We work closely with content experts to help build
 collaboration around search quality. (disclaimer, yes we build a product
 around this) but the advice still stands regardless.


 http://www.opensourceconnections.com/2013/10/14/what-is-test-driven-search-relevancy/

 Cheers
 -Doug Turnbull
 Search Relevancy Expert
 OpenSource Connections




 On Sun, Oct 20, 2013 at 4:21 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:

  Let's assume that you have keywords to search and different
 configurations
  for indexing. A/B testing is one of techniques that you can use as like
  Erick mentioned.
 
  If you want to have an automated comparison and do not have a oracle for
  A/B testing there is another way. If you have an ideal result list you
 can
  compare the similarity of your different configuration results and that
  ideal result list.
 
  The ideal result list can be created by an expert just for one time. If
  you are developing a search engine you can search same keywords at that
 one
  of search engines and you can use that results as ideal result list to
  measure your result lists' similarities.
 
  Kendall's tau is one of the methods to use for such kind of situations.
 If
  you do not have any document duplication at your index (without any other
  versions) I suggest to use tau a.
 
  If you explain your system and if you explain what is good for you or
 what
  is ideal for you I can explain you more.
 
  Thanks;
  Furkan KAMACI
 
 
  2013/10/18 Erick Erickson erickerick...@gmail.com
 
   bq: How do you compare the quality of your
   search result in order to decide which schema is better?
  
   Well, that's actually a hard problem. There's the
   various TREC data, but that's a generic solution and most
   every individual application of this generic thing called
   search has its own version of good results.
  
   Note that scores are NOT comparable across different
   queries even in the same data set, so don't go down that
   path.
  
   I'd fire the question back at you, Can you define what
   good (or better) results are in such a way that you can
   program an evaluation? Often the answer is no...
  
   One common technique is to have knowledgable users
   do what's called A/B testing. You fire the query at two
   separate Solr instances and display the results side-by-side,
   and the user says A is more relevant, or B is more
   relevant. Kind of like an eye doctor. In sophisticated A/B
   testing, the program randomly changes which side the
   results go, so you remove sidedness bias.
  
  
   FWIW,
   Erick
  
  
   On Thu, Oct 17, 2013 at 11:28 AM, Alvaro Cabrerizo topor...@gmail.com
   wrote:
  
Hi,
   
Imagine the next situation. You have a corpus of documents and a list
  of
queries extracted from production environment. The corpus haven't
 been
manually annotated with relvant/non relevant tags for every query.
 Then
   you
configure various solr instances changing the schema (adding
 synonyms,
stopwords...). After indexing, you prepare and execute the test over
different schema configurations.  How do you compare the quality of
  your
search result in order to decide which schema is better?
   
Regards.
   
  
 



 --
 Doug Turnbull
 Search  Big Data Architect
 OpenSource Connections http://o19s.com

Re: Exact Match Results

2013-10-21 Thread Developer

For exact phrase match you can wrap the query inside quotes but this will
perform the exact match and it wont match other results.

The below query will match only : Okkadu telugu movie stills

http://localhost:8983/solr/core1/select?q=%22okkadu%20telugu%20movie%20stills%22

Since you are using Edge N Gram filter, it produces so many tokens (as
below). You might not get the desired output. You can try using shingle
factory with standard analyzer instead of using edge n gram filter.

o
[6f]
0
26
1
1
word

ok
[6f 6b]
0
26
1
1
word

okk
[6f 6b 6b]
0
26
1
1
word

okka
[6f 6b 6b 61]
0
26
1
1
word

okkad
[6f 6b 6b 61 64]
0
26
1
1
word

okkadu
[6f 6b 6b 61 64 75]
0
26
1
1
word

okkadu
[6f 6b 6b 61 64 75 20]
0
26
1
1
word

okkadu t
[6f 6b 6b 61 64 75 20 74]
0
26
1
1
word

okkadu te
[6f 6b 6b 61 64 75 20 74 65]
0
26
1
1
word

okkadu tel
[6f 6b 6b 61 64 75 20 74 65 6c]
0
26
1
1
word





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-Match-Results-tp4096816p4096906.html
Sent from the Solr - User mailing list archive at Nabble.com.

61 matches

Mail list logo