Use of multiple tomcat instance and shards.

2011-03-07 Thread rajini maski
  In order to increase the Java heap memory, I have only 2gb ram… so
my default memory configuration is --JvmMs 128 --JvmMx 512  . I have the
single solr data index upto 6gb. Now if I am trying to fire a search very
often on this data index, after sometime I find an error as java heap space
out of memory error and search does not return results. What are the
possibilities to fix this error? (I cannot increase heap memory) How about
having another tomcat instance running (how this works? )or is it by
configuring shards? What is that might help me fix this search fail?


Rajani


Re: New PHP API for Solr (Logic Solr API)

2011-03-07 Thread Burak

On 03/07/2011 12:43 AM, Stefan Matheis wrote:

Burak,

what's wrong with the existing PHP-Extension
(http://php.net/manual/en/book.solr.php)?
I think "wrong" is not the appropriate word here. But if I had to 
summarize why I wrote this API:


* Not everybody is enthusiastic about adding another item to an already 
long list of server dependencies. I just wanted a pure PHP option.
* I am not a C programmer either so the ability to understand the source 
code and modify it according to my needs is another advantage.
* Yes, a PECL package would be faster. However, in 99% of the cases, 
after everything is said, coded, and byte-code cached, my biggest 
bottlenecks end up being the database and network.

* Last of all, choice is what open source means to me.

Burak










Re: logical relation among filter queries

2011-03-07 Thread Jayendra Patil
you can use the boolean operators in the filter query.

e.g. fq=rating:(PG-13 OR R)

Regards,
Jayendra

On Mon, Mar 7, 2011 at 9:25 PM, cyang2010  wrote:
> I wonder what is the logical relation among filter queries.  I can't find
> much documentation on filter query.
>
> for example,  i want to find all titles that is either PG-13 or R through
> filter query.   The following query won't give me any result back.  So I
> suppose by default it is intersection among each filter query result?
>
> &fq=rating:PG-13&fq=rating:R&q=*:*
>
>
> How do i change it to union to include value for each filter query result?
>
> Thanks.
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/logical-relation-among-filter-queries-tp2649142p2649142.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


logical relation among filter queries

2011-03-07 Thread cyang2010
I wonder what is the logical relation among filter queries.  I can't find
much documentation on filter query.

for example,  i want to find all titles that is either PG-13 or R through
filter query.   The following query won't give me any result back.  So I
suppose by default it is intersection among each filter query result?

&fq=rating:PG-13&fq=rating:R&q=*:*


How do i change it to union to include value for each filter query result?

Thanks.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/logical-relation-among-filter-queries-tp2649142p2649142.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to handle searches across traditional and simplifies Chinese?

2011-03-07 Thread Robert Muir
On Mon, Mar 7, 2011 at 7:01 PM, Andy  wrote:
> Thanks. Please tell me more about the tables/software that does the 
> conversion. Really appreciate your help.
>

also you might be interested in this example:



http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTransformFilterFactory


Re: How to handle searches across traditional and simplifies Chinese?

2011-03-07 Thread François Schiettecatte
Here are a bunch of resources which will help:


This does TC <=> SC conversions:


http://search.cpan.org/~audreyt/Encode-HanConvert-0.35/lib/Encode/HanConvert.pm


This has a TC <=> SC converter in there somewhere:

http://www.mediawiki.org/wiki/MediaWiki


This explains some of the issues behind TC <=> SC conversions:

http://people.w3.org/rishida/scripts/chinese/


Misc tools:

http://mandarintools.com/


François


On Mar 7, 2011, at 7:01 PM, Andy wrote:

> Thanks. Please tell me more about the tables/software that does the 
> conversion. Really appreciate your help.
> 
> 
> --- On Mon, 3/7/11, François Schiettecatte  wrote:
> 
>> From: François Schiettecatte 
>> Subject: Re: How to handle searches across traditional and simplifies 
>> Chinese?
>> To: solr-user@lucene.apache.org
>> Date: Monday, March 7, 2011, 5:24 PM
>> I did a little research into this for
>> a client a while. The character mapping is not one to one
>> which complicates things (TC and SC have evolved
>> independently) and if you want to do a perfect job you will
>> need a dictionary. However there are tables out there (I can
>> dig one up for you) that allow conversion from one to the
>> other. So you would pick either TC or SC as your canonical
>> Chinese, and just convert all the documents and searches to
>> it.
>> 
>> I will stress that this is very much a brute force
>> approach, the mapping is not perfect and the two character
>> sets have evolved (much like UK and US English, I was
>> brought up in the UK and live in the US).
>> 
>> Hope this helps.
>> 
>> Cheers
>> 
>> François
>> 
>> On Mar 7, 2011, at 5:02 PM, Andy wrote:
>> 
>>> I have documents that contain both simplified and
>> traditional Chinese characters. Is there any way to search
>> across them? For example, if someone searches for 类
>> (simplified Chinese), I'd like to be able to recognize that
>> the equivalent character is 類 in traditional Chinese and
>> search for 类 or 類 in the documents. 
>>> 
>>> Is that something that Solr, or any related software,
>> can do? Is there a standard approach in dealing with this
>> problem?
>>> 
>>> Thanks.
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 



Re: How to handle searches across traditional and simplifies Chinese?

2011-03-07 Thread Andy
Thanks. Please tell me more about the tables/software that does the conversion. 
Really appreciate your help.


--- On Mon, 3/7/11, François Schiettecatte  wrote:

> From: François Schiettecatte 
> Subject: Re: How to handle searches across traditional and simplifies Chinese?
> To: solr-user@lucene.apache.org
> Date: Monday, March 7, 2011, 5:24 PM
> I did a little research into this for
> a client a while. The character mapping is not one to one
> which complicates things (TC and SC have evolved
> independently) and if you want to do a perfect job you will
> need a dictionary. However there are tables out there (I can
> dig one up for you) that allow conversion from one to the
> other. So you would pick either TC or SC as your canonical
> Chinese, and just convert all the documents and searches to
> it.
> 
> I will stress that this is very much a brute force
> approach, the mapping is not perfect and the two character
> sets have evolved (much like UK and US English, I was
> brought up in the UK and live in the US).
> 
> Hope this helps.
> 
> Cheers
> 
> François
> 
> On Mar 7, 2011, at 5:02 PM, Andy wrote:
> 
> > I have documents that contain both simplified and
> traditional Chinese characters. Is there any way to search
> across them? For example, if someone searches for 类
> (simplified Chinese), I'd like to be able to recognize that
> the equivalent character is 類 in traditional Chinese and
> search for 类 or 類 in the documents. 
> > 
> > Is that something that Solr, or any related software,
> can do? Is there a standard approach in dealing with this
> problem?
> > 
> > Thanks.
> > 
> > 
> > 
> 
> 





Re: How to handle searches across traditional and simplifies Chinese?

2011-03-07 Thread François Schiettecatte
I did a little research into this for a client a while. The character mapping 
is not one to one which complicates things (TC and SC have evolved 
independently) and if you want to do a perfect job you will need a dictionary. 
However there are tables out there (I can dig one up for you) that allow 
conversion from one to the other. So you would pick either TC or SC as your 
canonical Chinese, and just convert all the documents and searches to it.

I will stress that this is very much a brute force approach, the mapping is not 
perfect and the two character sets have evolved (much like UK and US English, I 
was brought up in the UK and live in the US).

Hope this helps.

Cheers

François

On Mar 7, 2011, at 5:02 PM, Andy wrote:

> I have documents that contain both simplified and traditional Chinese 
> characters. Is there any way to search across them? For example, if someone 
> searches for 类 (simplified Chinese), I'd like to be able to recognize that 
> the equivalent character is 類 in traditional Chinese and search for 类 or 類 in 
> the documents. 
> 
> Is that something that Solr, or any related software, can do? Is there a 
> standard approach in dealing with this problem?
> 
> Thanks.
> 
> 
> 



How to handle searches across traditional and simplifies Chinese?

2011-03-07 Thread Andy
I have documents that contain both simplified and traditional Chinese 
characters. Is there any way to search across them? For example, if someone 
searches for 类 (simplified Chinese), I'd like to be able to recognize that the 
equivalent character is 類 in traditional Chinese and search for 类 or 類 in the 
documents. 

Is that something that Solr, or any related software, can do? Is there a 
standard approach in dealing with this problem?

Thanks.





Re: Looking for a Lucene/Solr Contractor

2011-03-07 Thread Jan Høydahl
Please check http://wiki.apache.org/solr/Support and 
http://wiki.apache.org/lucene-java/Support for a list of companies you may 
contact.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 7. mars 2011, at 19.40, Drew Kutcharian wrote:

> Hi Everyone,
> 
> We are looking for someone to help us build a similarity engine. Here are 
> some preliminary specs for the project.
> 
> 1) We want to be able to show similar posts when a user posts a new block of 
> text. A good example of this is StackOverflow. When a user tries to ask a new 
> question, the system displays similar questions.
> 
> 2) This is for a messaging system, so indexing/analysis should happen 
> preferably at the time of posting, not later.
> 
> 3) The posts are going to be less than 1000 characters.
> 
> 4) We anticipate to have a millions of posts so the solution should consider 
> sharding techniques to shard the indexes on many machines.
> 
> 5) The solution can be delivered as a stand alone Java SE solution which can 
> be run from the command line, no web development necessary.
> 
> 6) We expect clean APIs.
> 
> Thanks,
> 
> Drew



Looking for a Lucene/Solr Contractor

2011-03-07 Thread Drew Kutcharian
Hi Everyone,

We are looking for someone to help us build a similarity engine. Here are some 
preliminary specs for the project.

1) We want to be able to show similar posts when a user posts a new block of 
text. A good example of this is StackOverflow. When a user tries to ask a new 
question, the system displays similar questions.

2) This is for a messaging system, so indexing/analysis should happen 
preferably at the time of posting, not later.

3) The posts are going to be less than 1000 characters.

4) We anticipate to have a millions of posts so the solution should consider 
sharding techniques to shard the indexes on many machines.

5) The solution can be delivered as a stand alone Java SE solution which can be 
run from the command line, no web development necessary.

6) We expect clean APIs.

Thanks,

Drew

Solr Cell & DataImport Tika handler broken - fails to index Zip file contents

2011-03-07 Thread Jayendra Patil
Working with the latest Solr Trunk code and seems the Tika handlers
for Solr Cell (ExtractingDocumentLoader.java) and Data Import handler
(TikaEntityProcessor.java) fails to index the zip file contents again.
It just indexes the file names again.
This issue was addressed some time back, late last year, but seems to
have reappeared with the latest code.

I had raised a jira for the Data Import handler part with the patch
and the testcase - https://issues.apache.org/jira/browse/SOLR-2332.
The same fix is needed for the Solr Cell as well.

I can raise a jira and provide the patch for the same, if the above
patch seems good enough.

Regards,
Jayendra


Re: Trying to use FieldReaderDataSource in DIH

2011-03-07 Thread Jeff Schmidt
I can see that XPathEntityProcessor.init() is using the no-arg version of 
Context.getDataSource(). Since fields are hierarchical, should that not be a 
request for the the current innermost data source (i.e. "fieldSource" which is 
a FieldReaderDataSource)?   Or should init() be looking at the dataSource 
attribute value of the field in order to effectively invoke 
Context.getDataSource("fieldSource")?

It seems I'm obsessing over this "bug" when it's probably some bigger picture 
thing I'm missing.  Given the other examples of using this technique, it's hard 
to believe I'm the first to encounter this issue. :)

Thanks,

Jeff

On Mar 4, 2011, at 10:00 AM, Jeff Schmidt wrote:

> Hello:
> 
> I'm trying to make use of FieldReaderDataSource so that I can read a (Oracle) 
> database CLOB, and then use XPathEntityProcessor to derive Solr field values 
> via xpath notation.
> 
> For an extra bit of fun, the CLOB itself is base 64 encoded and gzip'd.  I 
> created a transformer of my own to take care of the encoding and compression 
> and that seems to work.  I patterned the new transformer after the existing 
> ones (Solr 3.1 trunk).  Anyway, I can see in catalina.out, my own debug 
> output:
> 
> - Processing field: {toWrite=false, clob=true, 
> column=SUMMARY_XML, boost=1.0, gzip64=true}
> - Updated field: SUMMARY_XML to type: java.lang.String value: 
> ' name="LOC677213"/> id="677213" source="EG" species="MM" name="similar to U2AF homology motif 
> (UHM) kinase 1" 
> summary=""/>  
> name="unknown"/>  finding-count="0">©2000-2010  Ingenuity 
> Systems, Inc. All rights reserved.'
> 
> So, the transformer replaces the original CLOB extracted by ClobTransformer 
> with a String representing the decoded result. I then want to feed this XML 
> string to XPathEntityProcessor.  So, in my DIH data config file:
> 
> 
>name="ipsDb"
>type="JdbcDataSource" 
>driver="oracle.jdbc.driver.OracleDriver"
>
> url="jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac1-vip)(PORT=1537))(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac2-vip)(PORT=1537))(sdu=8760)(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EIPS1R)))"
>user="user"
>password="password"
>/>
> 
>   name="fieldSource"
>   type="FieldReaderDataSource"
>/>
> 
>
>   rootEntity="false"
>   name="ipsNode"
>dataSource="ipsDb"
>   query="select SUMMARY_XML from IPS_NODE where ROWNUM < 10"
>
> transformer="ClobTransformer,com.ingenuity.isec.util.SolrDihGzip64Transformer">
> 
>
> 
>  name="node"
>   dataSource="fieldSource"
>   dataField="ipsNode.SUMMARY_XML"  
>   processor="XPathEntityProcessor"
>   forEach="/node">
>   
>   
>   
>   ...
>   
>
>
> 
> 
> Basically, I'm trying to specify the (former CLOB, now String) SUMMARY_XML 
> field as the data field for the FieldReaderDataSource. I can see it has the 
> ability to simply return a StringReader() for String fields, rather than have 
> to deal with a Clob itself. So, I figured FieldReaderDataSource would be 
> happy with that and it would supply XPathEntityProcessor with XML contained 
> in the field's value.
> 
> But, when I do a full import, I see this:
> 
> Mar 4, 2011 9:10:26 AM org.apache.solr.handler.dataimport.DataImporter 
> doFullImport
> INFO: Starting Full Import
> Mar 4, 2011 9:10:26 AM org.apache.solr.core.SolrCore execute
> INFO: [ing-nodes] webapp=/solr path=/select 
> params={clean=false&commit=true&command=full-import&qt=/dataimport-ips} 
> status=0 QTime=31 
> Mar 4, 2011 9:10:26 AM org.apache.solr.handler.dataimport.SolrWriter 
> readIndexerProperties
> WARNING: Unable to read: dataimport-ips.properties
> Mar 4, 2011 9:10:26 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 
> call
> INFO: Creating a connection for entity ipsNode with URL: 
> jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac1-vip)(PORT=1537))(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac2-vip)(PORT=1537))(sdu=8760)(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EIPS1R)))
> Mar 4, 2011 9:10:28 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 
> call
> INFO: Time taken for getConnection(): 1838
> Mar 4, 2011 9:10:28 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 
> call
> INFO: Creating a connection for entity node with URL: 
> jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac1-vip)(PORT=1537))(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac2-vip)(PORT=1537))(sdu=8760)(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EIPS1R)))
> Mar 4, 2011 9:10:29 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 
> call
> INFO: Time taken for getConnection(): 111

Re: dismax, and too much qf?

2011-03-07 Thread Jeff Schmidt
Hi Jonathan:

On Mar 7, 2011, at 8:33 AM, Jonathan Rochkind wrote:

> I use about that many qf's in Solr 1.4.1.   It works. I'm not entirely sure 
> if it has performance implications -- I do have searching that is somewhat 
> slower then I'd like, but I'm not sure if the lengthy qf is a contributing 
> factor, or other things I'm doing (like a dozen different facet.fields too!). 
>   I haven't profiled everything.  But it doesn't grind my Solr to a halt or 
> anything, it works.

Thanks for the feedback on that. I'll learn more on how this performs in the 
coming months, but if the approach is doomed from the start, that would be good 
to know sooner rather than later, so I could consider doing something else (not 
sure what that would be). It is a pretty big customer requirement though, so 
perhaps it can be carried out regardless by using more EC2 instances? :)

> Seperately, I've also been thinking of other ways to get similar highlighting 
> behavior as you describe, give the 'field' that the match was in in the 
> highlight response, but haven't come up with anything great, if your approach 
> works, that's cool.  I've been trying to think of a way to store a single 
> stored field in a structured format (CSV? XML?), and somehow have the 
> highlighter return the complete 'field' that matches, not just the 
> surrounding X words. But haven't gotten anywhere on that, just an idle 
> thought.

That's an interesting idea. There are a number of other highlighting related 
parameters I've not yet played with yet, relating to fragment size, snippets, 
max analyzed chars etc.  Could those get your what you need w/o having to 
create a separate structured field?

In my case, most of the fields I'm searching are small in size, and I  just 
need to know in what field(s) a match occurred. Often, the actual matched 
characters are less important than the fact that the provided terms matched in 
that field.  

Take it easy,

Jeff

> 
> Jonathan
> 
> On 3/4/2011 10:09 AM, Jeff Schmidt wrote:
>> Hello:
>> 
>> I'm working on implementing a requirement where when a document is returned, 
>> we want to pithily tell the end user why. That is, say, with five documents 
>> returned, they may be so for similar or different reasons. These "reasons" 
>> are the field(s) in which matches occurred.  Some are more important than 
>> others, and I'll have to return just the most relevant one or two reasons to 
>> not overwhelm the user.
>> 
>> This is a separate goal than Solr's scoring of the returned documents. That 
>> is, index/query time boosting can indicate which fields are more significant 
>> in computing the overall document score, but then I need to know what fields 
>> where, matched with what terms. I do have an application that stands between 
>> Solr and the end user (RESTful API), so I figured I can rank the "reasons" 
>> and return more domain specific names rather than the Solr fields names.
>> 
>> So, I've turned to highlighting, and in the results I can see for each 
>> document ID the fields matched, and the text in the field etc. Great. But,  
>> to get that to work, I have to specifically query individual fields. That 
>> is, the approach of'ing a bunch of fields to a common text field 
>> for efficiency purposes is no longer an option. And, using the dismax 
>> request handler, I'm querying a lot of fields:
>> 
>>  
>> n_nameExact^4.0
>> n_macromolecule_nameExact^3.0
>> n_macromolecule_name^2.0
>> n_macromolecule_id^1.8
>> n_pathway_nameExact^1.5
>> n_top_regulates
>> n_top_regulated_by
>> n_top_binds
>> n_top_role_in_cell
>> n_top_disease
>> n_molecular_function
>> n_protein_family
>> n_subcell_location
>> n_pathway_name
>> n_cell_component
>> n_bio_process
>> n_synonym^0.5
>> n_macromolecule_summary^0.6
>> p_nameExact^4.0
>> p_name^2.0
>> p_description^0.6
>>  
>> 
>> Is that crazy?  Is telling Solr to look at so many individual fields going 
>> to be a performance problem?  I'm only prototyping at this stage and it 
>> works great. :)  I've not run anything yet at scale handling lots of 
>> requests.
>> 
>> There are two document types in that shared index, demarcated using a field 
>> named type.  So, when configuring the SolrJ SolrQuery, I do setup 
>> addFilterQuery() to select one or the other type.
>> 
>> Anyway, using dismax with all of those query fields along with highlighting, 
>> I get the information I need to render meaningful results for the end user.  
>> But, it has a sort of smell to it. :)   Shall I look for another way, or am 
>> I worrying about nothing?
>> 
>> I am current using Solr 3.1 trunk.
>> 
>> Thanks!
>> 
>> Jeff
>> --
>> Jeff Schmidt
>> 535 Consulting
>> j...@535consulting.com
>> http://www.535consulting.com
>> 
>> 

--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com


Re: New PHP API for Solr (Logic Solr API)

2011-03-07 Thread dan whelan

When are you going to complete the Texis Search API?



On 3/6/11 2:31 PM, Burak wrote:

Hello,

I have recently finished writing a PHP API for Solr and have released 
it under the Apache License. The project is called "Logic Solr API" 
and is located at https://github.com/buraks78/Logic-Solr-API/wiki. It 
has good unit test coverage (over 90%) but is still in alpha. So I am 
primarily interested in some feedback and help for testing if anybody 
is interested as my test setup is pretty limited in regards to the 
Solr version (1.4.1), PHP version (5.3.5), and Solr setup (data 
required for testing certain features fully is missing). The 
documentation is located at 
https://github.com/buraks78/Logic-Solr-API/wiki. Although it is pretty 
weak at this point, I believe it can get you started. I also have 
phpdocs under docs/api folder in the package if needed.


Burak






Re: dismax, and too much qf?

2011-03-07 Thread Jonathan Rochkind
I use about that many qf's in Solr 1.4.1.   It works. I'm not entirely 
sure if it has performance implications -- I do have searching that is 
somewhat slower then I'd like, but I'm not sure if the lengthy qf is a 
contributing factor, or other things I'm doing (like a dozen different 
facet.fields too!).   I haven't profiled everything.  But it doesn't 
grind my Solr to a halt or anything, it works.


Seperately, I've also been thinking of other ways to get similar 
highlighting behavior as you describe, give the 'field' that the match 
was in in the highlight response, but haven't come up with anything 
great, if your approach works, that's cool.  I've been trying to think 
of a way to store a single stored field in a structured format (CSV? 
XML?), and somehow have the highlighter return the complete 'field' that 
matches, not just the surrounding X words. But haven't gotten anywhere 
on that, just an idle thought.


Jonathan

On 3/4/2011 10:09 AM, Jeff Schmidt wrote:

Hello:

I'm working on implementing a requirement where when a document is returned, we want to 
pithily tell the end user why. That is, say, with five documents returned, they may be so 
for similar or different reasons. These "reasons" are the field(s) in which 
matches occurred.  Some are more important than others, and I'll have to return just the 
most relevant one or two reasons to not overwhelm the user.

This is a separate goal than Solr's scoring of the returned documents. That is, 
index/query time boosting can indicate which fields are more significant in computing the 
overall document score, but then I need to know what fields where, matched with what 
terms. I do have an application that stands between Solr and the end user (RESTful API), 
so I figured I can rank the "reasons" and return more domain specific names 
rather than the Solr fields names.

So, I've turned to highlighting, and in the results I can see for each document ID 
the fields matched, and the text in the field etc. Great. But,  to get that to work, 
I have to specifically query individual fields. That is, the approach 
of'ing a bunch of fields to a common text field for efficiency 
purposes is no longer an option. And, using the dismax request handler, I'm querying 
a lot of fields:

  
 n_nameExact^4.0
 n_macromolecule_nameExact^3.0
 n_macromolecule_name^2.0
 n_macromolecule_id^1.8
 n_pathway_nameExact^1.5
 n_top_regulates
 n_top_regulated_by
 n_top_binds
 n_top_role_in_cell
 n_top_disease
 n_molecular_function
 n_protein_family
 n_subcell_location
 n_pathway_name
 n_cell_component
 n_bio_process
 n_synonym^0.5
 n_macromolecule_summary^0.6
 p_nameExact^4.0
 p_name^2.0
 p_description^0.6
  

Is that crazy?  Is telling Solr to look at so many individual fields going to 
be a performance problem?  I'm only prototyping at this stage and it works 
great. :)  I've not run anything yet at scale handling lots of requests.

There are two document types in that shared index, demarcated using a field 
named type.  So, when configuring the SolrJ SolrQuery, I do setup 
addFilterQuery() to select one or the other type.

Anyway, using dismax with all of those query fields along with highlighting, I 
get the information I need to render meaningful results for the end user.  But, 
it has a sort of smell to it. :)   Shall I look for another way, or am I 
worrying about nothing?

I am current using Solr 3.1 trunk.

Thanks!

Jeff
--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com




Re: Multiple Blocked threads on UnInvertedField.getUnInvertedField() & SegmentReader$CoreReaders.getTermsReader

2011-03-07 Thread Yonik Seeley
On Mon, Mar 7, 2011 at 9:44 AM, Rachita Choudhary
 wrote:
> As enum method , will create a bitset for all the unique values

It's more complex than that.
 - small sets will use a sorted int set... not a bitset
 - you can control what gets cached via facet.enum.cache.minDf parameter

-Yonik
http://lucidimagination.com


Re: Multiple Blocked threads on UnInvertedField.getUnInvertedField() & SegmentReader$CoreReaders.getTermsReader

2011-03-07 Thread Rachita Choudhary
Hi Yonik,

Thanks for the information, but we are still facing issues related to
slowness and high memory usage.

As per my understanding, the default 'FC' method suits are use case, as we
have total about 1.1 million documents and no. of unique values for facet
fields is quite high.
We facet on 5 fields and the no. of unique values are:
Field 1 : 19,000
Field 2 : 19,000
Field 3 : 55,000
Field 4:  474
Field 5 : 27 (The alphabetical faceting)

All the facet fields are of type string and multivalued.

As enum method , will create a bitset for all the unique values, it would be
consuming more memory compared to fc method.
Also even with a field value cache size of '100', the heap memory(max 6GB)
is getting consumed pretty fast.

With about 60 parallel requests contributing about 4 million queries, about
25% of our queries have QTime above 1 sec.
The max QTime shoots upto 55 sec.

Debugging deeper into the solr and lucene code, the particular method which
slows us down is IndexSearcher.numDocs which internally gets the terms by
loading it from the index.
I have not been able to determine the root cause of this.

Any other pointers/suggestions in this regard will be helpful.

Thanks,
Rachita

On Tue, Feb 22, 2011 at 10:42 PM, Yonik Seeley
wrote:

> On Tue, Feb 22, 2011 at 9:13 AM, Rachita Choudhary
>  wrote:
> > Hi Solr Users,
> >
> > We are upgrading from Solr 1.3 to Solr 1.4.1.
> > While using Solr 1.3 , we were seeing multiple blocking active threads on
> > "org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal() ".
> >
> > To utilize the benefits of NIO, on upgrading to Solr 1.4.1, we see other
> > type of multiple blocking threads on
> > "org.apache.solr.request.UnInvertedField.getUnInvertedField()  &
> >
> > SegmentReader$CoreReaders.getTermsReader".
> > Due to this, the QTimes shoots up from few hundreds to thousand of
> > msec.. even going upto 30-40 secs for a single query.
> >
> > - The multiple blocking threads show up after few thousands of queries.
> > - We do not have faceting and sorting on the same fields.
> > - Our facet fields are multivalued text fields, but no large text values
> are
> > present.
> > - Index size - around 10 GB
> > - We have not specified any method for faceting in our schema.xml.
> > - Our field value cache settings are:
> >   >class="solr.FastLRUCache"
> >size="175"
> >autowarmCount="0"
> >showItems="10"
> >  />
> >
> > Can someone please tell us the why we are seeing these blocked threads ?
> > Also if they are related to our field value cache , then a cache of size
> 175
> > will be filled up with very few initial queries and right after that we
> > should see multiple blocking threads ?
> > What difference it will make if we have "facet.method = enum" ?
>
> fc method on a multivalued field instantiates an UnInvertedField (like
> a multi-valued field cache) which can take some time.
> Just like sorting, you may want to use some warming faceting queries
> to make sure that real queries don't pay the cost of the initial entry
> construction.
>
> From your fieldValueCache statistics, it looks like the number of
> terms is low enough that the enum method may be fine here.
>
> -Yonik
> http://lucidimagination.com
>
>
> > Is this all related to fieldValueCache or is there some other
> configuration
> > which we need to set to avoid these blocking threads?
> >
> > Thanks,
> > Rachita
> >
> > *Cache values example:
> > *facetField1_27443 :
> >
> {field=facet1_27443,memSize=4214884,tindexSize=52,time=22,phase1=15,nTerms=4,bigTerms=0,termInstances=6,uses=1}
> >
> > facetField1_70 :
> >
> {field=facetField1_70,memSize=4223310,tindexSize=308,time=28,phase1=21,nTerms=636,bigTerms=0,termInstances=14404,uses=1}
> >
> > facetField2 :
> {field=facetField2,memSize=4262644,tindexSize=3156,time=273,phase1=267,nTerms=12188,bigTerms=0,termInstances=1255522,uses=7031}
>


Re: Solr Autosuggest help

2011-03-07 Thread rahul
hi..

thanks for your replies..

It seems I mistakenly put ShingleFilterFactory in another field. When I put
the factory in correct field it works fine now. 

Thanks.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2645780.html
Sent from the Solr - User mailing list archive at Nabble.com.


StreamingUpdateSolrServer

2011-03-07 Thread Isan Fulia
Hi all,
I am using StreamingUpdateSolrServer with queuesize = 5 and threadcount=4
The no. of connections created are same as threadcount.
Is it that it creates a new connection for every thread.


-- 
Thanks & Regards,
Isan Fulia.


Re: Solr Autosuggest help

2011-03-07 Thread Ahmet Arslan
> I have added the following line in both the  section
> and in   section in
> schema.xml.
> 
> filter class="solr.ShingleFilterFactory"
> maxShingleSize="2"
> outputUnigrams="true" outputUnigramIfNoNgram="true"
> 
> And reindex my content. However, if I query solr for the
> multi work search
> terms suggestion , it only send the single word
> suggestions.
> 
> http://localhost:8080/solr/mydata/select?qt=/terms&terms=true&terms.fl=content&terms.lower=java&terms.prefix=java&terms.lower.incl=false&indent=true
> 
> It wont return the words like 'java final', it only returns
> words like
> javadoc, javascript..
> 
> Could any one update me how to correct this.. or what I am
> missing..

What happens when you add &terms.limit=-1 to your search URL?

Or when you use java plus one blank character in terms.prefix?
&terms.prefix=java &indent=true

Can you see multi-word terms in admin/schema.jsp page?





Re: New PHP API for Solr (Logic Solr API)

2011-03-07 Thread Lukas Kahwe Smith

On 07.03.2011, at 09:43, Stefan Matheis wrote:

> Burak,
> 
> what's wrong with the existing PHP-Extension
> (http://php.net/manual/en/book.solr.php)?


the main issue i see with it is that the API isn't "designed" much. aka it just 
exposes lots of features with dedicated methods, but doesnt focus on keeping 
the API easy to overview (aka keep simple things simple and make complex stuff 
possible). at the same time fundamental stuff like quoting are not covered.

that being said, i do not think we really need a proliferation of solr API's 
for PHP, even if this one is based on PHP 5.3 (namespaces etc). btw there is 
already another PHP 5.3 based API, though it tries to also unify other Lucene 
based API's as much as possible:
https://github.com/dstendardi/Ariadne

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





Re: New PHP API for Solr (Logic Solr API)

2011-03-07 Thread Stefan Matheis
Burak,

what's wrong with the existing PHP-Extension
(http://php.net/manual/en/book.solr.php)?

Regards
Stefan

On Sun, Mar 6, 2011 at 11:31 PM, Burak  wrote:
> Hello,
>
> I have recently finished writing a PHP API for Solr and have released it
> under the Apache License. The project is called "Logic Solr API" and is
> located at https://github.com/buraks78/Logic-Solr-API/wiki. It has good unit
> test coverage (over 90%) but is still in alpha. So I am primarily interested
> in some feedback and help for testing if anybody is interested as my test
> setup is pretty limited in regards to the Solr version (1.4.1), PHP version
> (5.3.5), and Solr setup (data required for testing certain features fully is
> missing). The documentation is located at
> https://github.com/buraks78/Logic-Solr-API/wiki. Although it is pretty weak
> at this point, I believe it can get you started. I also have phpdocs under
> docs/api folder in the package if needed.
>
> Burak
>
>
>


Re: Drop documents when indexing with DHI

2011-03-07 Thread Stefan Matheis
Rosa,

try http://wiki.apache.org/solr/DataImportHandler#Special_Commands

HTH
Stefan

On Fri, Mar 4, 2011 at 9:44 PM, Rosa (Anuncios)
 wrote:
> Hi,
>
> Is it possible to skip document when indexing with DHI based on a regex to
> filter certain "badwords" for example?
>
> Thanks for your help,
>
> rosa
>