Re: 2 solr dataImport requests on a single core at the same time

2010-07-22 Thread kishan

please help me
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/2-solr-dataImport-requests-on-a-single-core-at-the-same-time-tp978649p986351.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dismax query response field number

2010-07-22 Thread scrapy

No, i'm talking about fields.

In my schema i've got about 15 fields with:  stored=true

Like this:
   field name=city  type=text indexed=true stored=true/


 
But when i run a query it return me only 10 fields, the last 4 or 5 are not the 
the response??



 

-Original Message-
From: Lance Norskog goks...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thu, Jul 22, 2010 2:47 am
Subject: Re: Dismax query response field number


Fields or documents? It will return all of the fields that are 'stored'.



The default number of documents to return is 10. Returning all of the

documents is very slow, so you have to request that with the rows=

parameter.



On Wed, Jul 21, 2010 at 3:32 PM,  scr...@asia.com wrote:







  Hi,



 It seems that not all field are returned from query response when i use 

DISMAX? Only first 10??



 Any idea?



 Here is my solrconfig:



  requestHandler name=dismax class=solr.SearchHandler 

lst name=defaults

 str name=defTypedismax/str

 str name=echoParamsexplicit/str

   str name=fl*/str

 float name=tie0.01/float

 str name=qf

text^0.5 content^1.1 title^1.5

 /str

 str name=pf

text^0.2 content^1.1 title^1.5

 /str

 str name=bf

recip(price,1,1000,1000)^0.3

 /str

 str name=mm

2-1 5-2 690%

 /str

 int name=ps100/int

 str name=q.alt*:*/str

 !-- example highlighter config, enable per-query with hl=true --

 str name=hl.fltext features name/str

 !-- for this field, we want no fragmenting, just highlighting --

 str name=f.name.hl.fragsize0/str

 !-- instructs Solr to return the field itself if no query terms are

  found --

 str name=f.name.hl.alternateFieldname/str

 str name=f.text.hl.fragmenterregex/str !-- defined below --

/lst

  /requestHandler













-- 

Lance Norskog

goks...@gmail.com


 


Re: Dismax query response field number

2010-07-22 Thread Grijesh.singh

Do u have data in that field also,Solr returns field which have data only.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-query-response-field-number-tp985567p986417.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Clustering results limit?

2010-07-22 Thread Stanislaw Osinski
Hi,

 I am attempting to cluster a query. It kinda works, but where my
 (regular) query returns 500 results the cluster only shows 1-10 hits for
 each cluster (5 clusters). Never more than 10 docs and I know its not
 right. What could be happening here? It should be showing dozens of
 documents per cluster.


Just to clarify -- how many documents do you see in the response (result
name=response / section)? Clustering is performed on the search results
(in real time), so if you request 10 results, clustering will apply only to
those 10 results. To get a larger number of clusters you'd need to request
more results, e.g. 50, 100, 200 etc. Obviously, the trade-off here is that
it will take longer to fetch the documents from the index, clustering time
will also increase. For some guidance on choosing the clustering algorithm,
you can take a look at the following section of Carrot2 manual:
http://download.carrot2.org/stable/manual/#section.advanced-topics.fine-tuning.choosing-algorithm
.

Cheers,

Staszek


Re: 2 solr dataImport requests on a single core at the same time

2010-07-22 Thread Alexey Serba
DataImportHandler does not support parallel execution of several
requests. You should either send your requests sequentially or
register several DIH handlers in solrconfig and use them in parallel.


On Thu, Jul 22, 2010 at 11:20 AM, kishan mklpra...@gmail.com wrote:

 please help me
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/2-solr-dataImport-requests-on-a-single-core-at-the-same-time-tp978649p986351.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Dismax query response field number

2010-07-22 Thread scrapy

 Yes i've data... maybe my query is wrong?

select?q=motoqt=dismaxq=city:Paris

Field city is not showing?

 


 

 

-Original Message-
From: Grijesh.singh pintu.grij...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thu, Jul 22, 2010 10:07 am
Subject: Re: Dismax query response field number



Do u have data in that field also,Solr returns field which have data only.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-query-response-field-number-tp985567p986417.html
Sent from the Solr - User mailing list archive at Nabble.com.

 


Re: Securing Solr 1.4 in a glassfish container AS NEW THREAD

2010-07-22 Thread Bilgin Ibryam
Are you using the same instance of CommonsHttpSolrServer for all the
requests?

On Wed, Jul 21, 2010 at 4:50 PM, Sharp, Jonathan jsh...@coh.org wrote:


 Some further information --

 I tried indexing a batch of PDFs with the client and Solr CELL, setting
 the credentials in the httpclient. For some reason after successfully
 indexing several hundred files I start getting a SolrException:
 Unauthorized and an info message (for every subsequent file):

 INFO basic authentication scheme selected
 Org.apache.commons.httpclient.HttpMethodDirector process
 WWWAuthChallenge
 INFO Failure authenticating with BASIC 'realm'@host:port

 I increased session timeout in web.xml with no change. I'm looking
 through the httpclient authentication now.

 -Jon

 -Original Message-
 From: Sharp, Jonathan
 Sent: Friday, July 16, 2010 8:59 AM
 To: 'solr-user@lucene.apache.org'
 Subject: RE: Securing Solr 1.4 in a glassfish container AS NEW THREAD

 Hi Bilgin,

 Thanks for the snippet -- that helps a lot.

 -Jon

 -Original Message-
 From: Bilgin Ibryam [mailto:bibr...@gmail.com]
 Sent: Friday, July 16, 2010 1:31 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Securing Solr 1.4 in a glassfish container AS NEW THREAD

 Hi Jon,

 SolrJ (CommonsHttpSolrServer) internally uses apache http client to
 connect
 to solr. You can check there for some documentation.
 I secured solr also with BASIC auth-method and use the following snippet
 to
 access it from solrJ:

  //set username and password
  ((CommonsHttpSolrServer)
 server).getHttpClient().getParams().setAuthenticationPreemptive(true);
  Credentials defaultcreds = new
 UsernamePasswordCredentials(username,
 secret);
  ((CommonsHttpSolrServer)
 server).getHttpClient().getState().setCredentials(new
 AuthScope(localhost,
 80, AuthScope.ANY_REALM), defaultcreds);

 HTH
 Bilgin Ibryam



 On Fri, Jul 16, 2010 at 2:35 AM, Sharp, Jonathan jsh...@coh.org wrote:

  Hi All,
 
  I am considering securing Solr with basic auth in glassfish using the
  container, by adding to web.xml and adding sun-web.xml file to the
  distributed WAR as below.
 
  If using SolrJ to index files, how can I provide the credentials for
  authentication to the http-client (or can someone point me in the
 direction
  of the right documentation to do that or that will help me make the
  appropriate modifications) ?
 
  Also any comment on the below is appreciated.
 
  Add this to web.xml
  ---
login-config
auth-methodBASIC/auth-method
realm-nameSomeRealm/realm-name
/login-config
security-constraint
web-resource-collection
web-resource-nameAdmin Pages/web-resource-name
url-pattern/admin/url-pattern
url-pattern/admin/*/url-pattern
 
 
 http-methodGET/http-methodhttp-methodPOST/http-methodhttp-metho
 dPUT/http-methodhttp-methodTRACE/http-methodhttp-methodHEAD/htt
 p-methodhttp-methodOPTIONS/http-methodhttp-methodDELETE/http-met
 hod
/web-resource-collection
auth-constraint
role-nameSomeAdminRole/role-name
/auth-constraint
/security-constraint
security-constraint
web-resource-collection
web-resource-nameUpdate Servlet/web-resource-name
url-pattern/update/*/url-pattern
 
 
 http-methodGET/http-methodhttp-methodPOST/http-methodhttp-metho
 dPUT/http-methodhttp-methodTRACE/http-methodhttp-methodHEAD/htt
 p-methodhttp-methodOPTIONS/http-methodhttp-methodDELETE/http-met
 hod
/web-resource-collection
auth-constraint
role-nameSomeUpdateRole/role-name
/auth-constraint
/security-constraint
security-constraint
web-resource-collection
web-resource-nameSelect Servlet/web-resource-name
url-pattern/select/*/url-pattern
 
 
 http-methodGET/http-methodhttp-methodPOST/http-methodhttp-metho
 dPUT/http-methodhttp-methodTRACE/http-methodhttp-methodHEAD/htt
 p-methodhttp-methodOPTIONS/http-methodhttp-methodDELETE/http-met
 hod
/web-resource-collection
auth-constraint
role-nameSomeSearchRole/role-name
/auth-constraint
/security-constraint
  ---
 
  Also add this as sun-web.xml
 
  
  ?xml version=1.0 encoding=UTF-8?
  !DOCTYPE sun-web-app PUBLIC -//Sun Microsystems, Inc.//DTD
 Application
  Server 9.0 Servlet 2.5//EN 
  http://www.sun.com/software/appserver/dtds/sun-web-app_2_5-0.dtd;
  sun-web-app error-url=
   context-root/Solr/context-root
   jsp-config
property name=keepgenerated value=true
  descriptionKeep a copy of the generated servlet class' java
  code./description
/property
   /jsp-config
   security-role-mapping
  role-nameSomeAdminRole/role-name
  group-nameSomeAdminGroup/group-name
   /security-role-mapping
   security-role-mapping
  role-nameSomeUpdateRole/role-name
  

Re: Dismax query response field number

2010-07-22 Thread Peter Karich
maybe its too simple, but did you try the rows=20 or sth. greater as
Lance suggested?
=

select?rows=20qt=dismax

Regards,
Peter.

  Yes i've data... maybe my query is wrong?

 select?q=motoqt=dismaxq=city:Paris

 Field city is not showing?

  


  

  

 -Original Message-
 From: Grijesh.singh pintu.grij...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thu, Jul 22, 2010 10:07 am
 Subject: Re: Dismax query response field number



 Do u have data in that field also,Solr returns field which have data only.
   


-- 
http://karussell.wordpress.com/



Re: solrconfig.xml and xinclude

2010-07-22 Thread Tommaso Teofili
Hi,
I am trying to do a similar thing within the schema.xml (using Solr 1.4.1),
having a (super)schema that is common to 2 instances and specific fields I
would like to include (with XInclude).
Something like this:

*schema name=dummy ... 
   ...
  field name=A type=string indexed=true stored=false
required=false multiValued=true/
  field name=B type=string indexed=true stored=false
required=false multiValued=true/
  field name=C type=string indexed=true stored=true
required=false/
  !-- xincluding here --
  xi:include href=solr/conf/specific_**fields_1.xml parse=xml
xi:fallback
   xi:include href=solr/conf/specific_fields_2.**xml
parse=xml/
 /xi:fallback
  /xi:include
  ...
/schema*

and it works with the specific_fields_1.xml (or specific_fields_2.xml) like
the following:

*field name=first_specific_field type=string indexed=true
stored=true required=false/*

but it stops working when I add more than one field in the included XML:

*fields*
*  field name=first_specific_field type=string indexed=true
stored=true required=false/*
  *field name=second_specific_field type=string indexed=true
stored=false required=false/*
*/fields*

and consequently modify the including element as following:

 * xi:include href=solr/conf/**specific_**fields_1**.xml parse=xml
xpointer=/fields/field
 xi:fallback
   xi:include href=solr/conf/**specific_**fields_2**.xml
parse=xml xpointer=/fields/field/
 /xi:fallback
   /xi:include*

I tried to modify the *xpointer* attribute value to:
*fields/field
fields/*
/fields/*
element(/fields/field)
element(/fields/*)
element(fields/field)
element(fields/*)
*
but I had no luck.


Fiedzia, I think that xpointer=xpointer(something) won't work as you can
read in the last sentence of the page regarding SolrConfig.xml [1].
I took a look to the Solr source code and I found a JUnit test for the
XInclusion that tests the inclusion documented in the wiki [2][3].
I also found an entry on Lucid Imagination website at [4] but couldn't fix
my issue.
Please, could someone help us regarding what is the right way to configure
XInclude inside Solr?
Thanks in advance for your time.
Cheers,
Tommaso

[1] : http://wiki.apache.org/solr/SolrConfigXml
[2] : http://wiki.apache.org/solr/SolrConfigXml#XInclude
[3] :
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/test/org/apache/solr/core/TestXIncludeConfig.java
[4] :
http://www.lucidimagination.com/search/document/31a60b7ccad76de1/is_it_possible_to_use_xinclude_in_schema_xml


2010/7/21 fiedzia fied...@gmail.com


 I am trying to export some config options common to all cores into single
 file,
 which would be included using xinclude. The only problem is how to include
 childrens of given node.


 common_solrconfig.xml looks like that:
 ?xml version=1.0 encoding=UTF-8 ?
 config
  lib dir=/solr/lib /
 /config


 solrconfig.xml looks like that:
 ?xml version=1.0 encoding=UTF-8 ?
 config
 !-- xinclude here --
 /config


 now all of the following attemps have failed:

 xi:include href=/solr/common_solrconfig.xml
 xmlns:xi=http://www.w3.org/2001/XInclude;/xi:include
 xi:include href=/solr/common_solrconfig.xml xpointer=config/*
 xmlns:xi=http://www.w3.org/2001/XInclude;/xi:include
 xi:include href=/solr/common_solrconfig.xml
 xpointer=xpointer(config/*)
 xmlns:xi=http://www.w3.org/2001/XInclude;/xi:include

 xi:include href=/solr/common_solrconfig.xml xpointer=element(config/*)
 xmlns:xi=http://www.w3.org/2001/XInclude;/xi:include


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solrconfig-xml-and-xinclude-tp984058p984058.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Dismax query response field number

2010-07-22 Thread Chantal Ackermann
is this a typo in your query or in your e-mail?

you have the q parameter twice.
use fq for query inputs that mention a field explicitly when using
dismax.

So it should be:
select?q=motoqt=dismax fq =city:Paris

(the whitespace is only for visualization)


chantal


On Thu, 2010-07-22 at 11:03 +0200, scr...@asia.com wrote:
 Yes i've data... maybe my query is wrong?
 
 select?q=motoqt=dismaxq=city:Paris
 
 Field city is not showing?
 
  
 
 
 
 
 
 
 -Original Message-
 From: Grijesh.singh pintu.grij...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thu, Jul 22, 2010 10:07 am
 Subject: Re: Dismax query response field number
 
 
 
 Do u have data in that field also,Solr returns field which have data only.





Re: Dismax query response field number

2010-07-22 Thread scrapy

 Thanks,

That was the problem!




select?q=motoqt=dismax fq =city:Paris


 

 


 

 

-Original Message-
From: Chantal Ackermann chantal.ackerm...@btelligent.de
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Thu, Jul 22, 2010 12:47 pm
Subject: Re: Dismax query response field number


is this a typo in your query or in your e-mail?

you have the q parameter twice.
use fq for query inputs that mention a field explicitly when using
dismax.

So it should be:
select?q=motoqt=dismax fq =city:Paris

(the whitespace is only for visualization)


chantal


On Thu, 2010-07-22 at 11:03 +0200, scr...@asia.com wrote:
 Yes i've data... maybe my query is wrong?
 
 select?q=motoqt=dismaxq=city:Paris
 
 Field city is not showing?
 
  
 
 
 
 
 
 
 -Original Message-
 From: Grijesh.singh pintu.grij...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thu, Jul 22, 2010 10:07 am
 Subject: Re: Dismax query response field number
 
 
 
 Do u have data in that field also,Solr returns field which have data only.




 


Re: Clustering results limit?

2010-07-22 Thread Darren Govoni
Staszek,
  Thank you. The cluster response has a maximum of 10 documents in each
cluster. I didn't set this limit and the query by itself returns 500+
documents. There should be many more than 10 in each cluster. Does it
default to 10 maybe? Or is there a way to say, cluster every result in
the query?

thank you, I will read the links again,
Darren

On Thu, 2010-07-22 at 10:15 +0200, Stanislaw Osinski wrote:

 Hi,
 
  I am attempting to cluster a query. It kinda works, but where my
  (regular) query returns 500 results the cluster only shows 1-10 hits for
  each cluster (5 clusters). Never more than 10 docs and I know its not
  right. What could be happening here? It should be showing dozens of
  documents per cluster.
 
 
 Just to clarify -- how many documents do you see in the response (result
 name=response / section)? Clustering is performed on the search results
 (in real time), so if you request 10 results, clustering will apply only to
 those 10 results. To get a larger number of clusters you'd need to request
 more results, e.g. 50, 100, 200 etc. Obviously, the trade-off here is that
 it will take longer to fetch the documents from the index, clustering time
 will also increase. For some guidance on choosing the clustering algorithm,
 you can take a look at the following section of Carrot2 manual:
 http://download.carrot2.org/stable/manual/#section.advanced-topics.fine-tuning.choosing-algorithm
 .
 
 Cheers,
 
 Staszek




Re: Clustering results limit?

2010-07-22 Thread Darren Govoni
I set the rows=50 on my clustering URL in a browser and it returns more.

In my SolrJ, I used ModifiableSolrParams and I set (rows,50) but it
still returns less than 10 for each cluster.

Is there a way to set rows wanted with ModifiableSolrParams?

thanks and sorry for the double post.

Darren

On Thu, 2010-07-22 at 10:15 +0200, Stanislaw Osinski wrote:

 Hi,
 
  I am attempting to cluster a query. It kinda works, but where my
  (regular) query returns 500 results the cluster only shows 1-10 hits for
  each cluster (5 clusters). Never more than 10 docs and I know its not
  right. What could be happening here? It should be showing dozens of
  documents per cluster.
 
 
 Just to clarify -- how many documents do you see in the response (result
 name=response / section)? Clustering is performed on the search results
 (in real time), so if you request 10 results, clustering will apply only to
 those 10 results. To get a larger number of clusters you'd need to request
 more results, e.g. 50, 100, 200 etc. Obviously, the trade-off here is that
 it will take longer to fetch the documents from the index, clustering time
 will also increase. For some guidance on choosing the clustering algorithm,
 you can take a look at the following section of Carrot2 manual:
 http://download.carrot2.org/stable/manual/#section.advanced-topics.fine-tuning.choosing-algorithm
 .
 
 Cheers,
 
 Staszek




Re: Dismax query response field number

2010-07-22 Thread Justin Lolofie
scrapy what version of solr are you using?

I'd like to do fq=city:Paris but it doesnt seem to work for me (solr
1.4) and the docs seem to suggest its a feature that is coming but not
there yet? Or maybe I misunderstood?


On Thu, Jul 22, 2010 at 6:00 AM,  scr...@asia.com wrote:

  Thanks,

 That was the problem!




 select?q=motoqt=dismax fq =city:Paris











 -Original Message-
 From: Chantal Ackermann chantal.ackerm...@btelligent.de
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Thu, Jul 22, 2010 12:47 pm
 Subject: Re: Dismax query response field number


 is this a typo in your query or in your e-mail?

 you have the q parameter twice.
 use fq for query inputs that mention a field explicitly when using
 dismax.

 So it should be:
 select?q=motoqt=dismax fq =city:Paris

 (the whitespace is only for visualization)


 chantal


 On Thu, 2010-07-22 at 11:03 +0200, scr...@asia.com wrote:
 Yes i've data... maybe my query is wrong?

 select?q=motoqt=dismaxq=city:Paris

 Field city is not showing?








 -Original Message-
 From: Grijesh.singh pintu.grij...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thu, Jul 22, 2010 10:07 am
 Subject: Re: Dismax query response field number



 Do u have data in that field also,Solr returns field which have data only.








Re: Dismax query response field number

2010-07-22 Thread scrapy

 I'm using Solr 1.4.1

 


 

 

-Original Message-
From: Justin Lolofie jta...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thu, Jul 22, 2010 2:57 pm
Subject: Re: Dismax query response field number


scrapy what version of solr are you using?

I'd like to do fq=city:Paris but it doesnt seem to work for me (solr
1.4) and the docs seem to suggest its a feature that is coming but not
there yet? Or maybe I misunderstood?


On Thu, Jul 22, 2010 at 6:00 AM,  scr...@asia.com wrote:

  Thanks,

 That was the problem!




 select?q=motoqt=dismax fq =city:Paris











 -Original Message-
 From: Chantal Ackermann chantal.ackerm...@btelligent.de
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Thu, Jul 22, 2010 12:47 pm
 Subject: Re: Dismax query response field number


 is this a typo in your query or in your e-mail?

 you have the q parameter twice.
 use fq for query inputs that mention a field explicitly when using
 dismax.

 So it should be:
 select?q=motoqt=dismax fq =city:Paris

 (the whitespace is only for visualization)


 chantal


 On Thu, 2010-07-22 at 11:03 +0200, scr...@asia.com wrote:
 Yes i've data... maybe my query is wrong?

 select?q=motoqt=dismaxq=city:Paris

 Field city is not showing?








 -Original Message-
 From: Grijesh.singh pintu.grij...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thu, Jul 22, 2010 10:07 am
 Subject: Re: Dismax query response field number



 Do u have data in that field also,Solr returns field which have data only.







 


Using Solr to perform range queries in Dspace

2010-07-22 Thread Mckeane

I'm trying to use dspace to search across a range of index created and stored
using Dsindexer.java class. I have seen where Solr can be use to perform
numerical range queries using either TrieIntField,
TrieDoubleField,TrieLongField, etc.. classes defined in Solr's api or 
SortableIntField.java, SortableLongField,SortableDoubleField.java. I would
like to know how to implement these classes in Dspace so that I can be able
to perform numerical range queries. Any help would be greatly apprciated.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-to-perform-range-queries-in-Dspace-tp987049p987049.html
Sent from the Solr - User mailing list archive at Nabble.com.


Getting FileNotFoundException with repl command=backup?

2010-07-22 Thread Peter Sturge
Informational

Hi,

This information is for anyone who might be running into problems when
performing explicit periodic backups of Solr indexes. I encountered this
problem, and hopefully this might be useful to others.
A related Jira issue is: SOLR-1475.

The issue is: When you execute a 'command=backup' request, the snapshot
starts, but then fails later on with file not found errors. This aborts the
snapshot, and you end up with no backup.

This error occurs if, during the backup process, Solr performs more commits
than its 'maxCommitsToKeep' setting in solrconfig.xml. If you don't commit
very often, you probably won't see this problem.
If, however, like me, you have Solr committing very often, the commit point
files for the backup can get deleted before the backup finishes. This is
particualrly true of larger indexes, where the backup can take some time.

Workaround 1:
One workaround to this is to set 'maxCommitsToKeep' to a number higher than
the total number of commits that can occur during the time it takes to do a
backup. Sounds like a 'finger-in-the-air' number? Well, yes it is.
If you commit every 20secs, and a full backup takes 10mins, you'll want a
value of at least 31. The trouble is, how long will a backup take? This can
vary hugely as the index grows, system is busy, disk fragmentation etc.
(my environment takes ~13mins to backup a 5.5GB index to a local folder)

An inefficiency of this approach that needs to be considered is the higher
the 'maxCommitsToKeep' number is, the more files you're going to have
lounging around in your index data folder - the majority of which never get
used. The collective size of these commit point files can be significant.
If you have a high mergeFactor, the number of files will increase as well.
You can set 'maxCommitAge' to delete old commit points after a certain time
- as long as it's not shorter than the 'worst-case' backup time.

I set my 'maxCommitsToKeep' to 2400, and the file not found errors
disappeared (note that 2400 is a hugely conservative number to cater for a
backup taking 24hrs). My mergeFactor is 25, so I get a high number of files
in the index folder, they are generally small in size, but significant extra
storage can be required.

If you're willing to trade off some (ok, potentially a lot of) extraneous
disk usage to keep commit points around waiting for a backup command, this
approach addresses the problem.

Workaround 2:
A preferable method (IMHO), is if you have an extra box, set up a read-only
replica, and then backup from the replica. Then you can then tune the slave
to suit your needs.

Coding:
I'm not very familiar with the repl/backup code, but a coded way to address
this might be to save a commit point's index version files when a backup
command is received, then release them for deletion when complete.
Perhaps someone with good knowledge of this part of Solr could comment more
succinctly.


Thanks,
Peter


Tree Faceting in Solr 1.4

2010-07-22 Thread Eric Grobler
Hi Solr Community

If I have:
COUNTRY CITY
Germany Berlin
Germany Hamburg
Spain   Madrid

Can I do faceting like:
Germany
  Berlin
  Hamburg
Spain
  Madrid

I tried to apply SOLR-792 to the current trunk but it does not seem to be
compatible.
Maybe there is a similar feature existing in the latest builds?

Thanks  Regards
Eric


Re: Tree Faceting in Solr 1.4

2010-07-22 Thread SR
Perhaps the following article can help: 
http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html

-S


On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote:

 Hi Solr Community
 
 If I have:
 COUNTRY CITY
 Germany Berlin
 Germany Hamburg
 Spain   Madrid
 
 Can I do faceting like:
 Germany
  Berlin
  Hamburg
 Spain
  Madrid
 
 I tried to apply SOLR-792 to the current trunk but it does not seem to be
 compatible.
 Maybe there is a similar feature existing in the latest builds?
 
 Thanks  Regards
 Eric



Re: Clustering results limit?

2010-07-22 Thread Stanislaw Osinski
Hi,

In my SolrJ, I used ModifiableSolrParams and I set (rows,50) but it
 still returns less than 10 for each cluster.


Oh, the number of documents per cluster very much depends on the
characteristics of your documents, it often happens that the algorithms
create larger numbers of smaller clusters. However, all returned documents
should get assigned to some cluster(s), the Other Topics one in the worst
case. Does that hold in your case?

If you'd like to tune clustering a bit, you can try Carrot2 tools:

http://download.carrot2.org/stable/manual/#section.getting-started.solr

and then:

http://download.carrot2.org/stable/manual/#chapter.tuning

Cheers,

S.


Delta import processing duration

2010-07-22 Thread Qwerky

I'm using Solr to index data from our data warehouse. The data is imported
through text files. I've written a custom FileImportDataImportHandler that
extends DataSource and it works fine - I've tested it with 280,000 records
and it manages to build the index in about 3 minutes. My problem is that
doing a delta update seems to take a really long time.

I've written a custome FileUpdateDataImportHandler which takes two files,
one for deletes and one fore updates. I've tested with an update file
containing 18,000 records and a delete file containing 30 records - my
custom handler whizzed through them in a few seconds but the page at
/solr/admin/dataimport.jsp says the command is still running (its been
running nearly an hour).

What's taking so long? Could there be some kind of inefficiency in the way
my update handler works?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Delta-import-processing-duration-tp987562p987562.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tree Faceting in Solr 1.4

2010-07-22 Thread Eric Grobler
Thank you for the link.

I was not aware of the multifaceting syntax - this will enable me to run 1
less query on the main page!

However this is not a tree faceting feature.

Thanks
Eric




On Thu, Jul 22, 2010 at 4:51 PM, SR r.steve@gmail.com wrote:

 Perhaps the following article can help:
 http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html

 -S


 On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote:

  Hi Solr Community
 
  If I have:
  COUNTRY CITY
  Germany Berlin
  Germany Hamburg
  Spain   Madrid
 
  Can I do faceting like:
  Germany
   Berlin
   Hamburg
  Spain
   Madrid
 
  I tried to apply SOLR-792 to the current trunk but it does not seem to be
  compatible.
  Maybe there is a similar feature existing in the latest builds?
 
  Thanks  Regards
  Eric




Solr on iPad?

2010-07-22 Thread Stephan Schwab

Dear Solr community,

does anyone know whether it may be possible or has already been done to
bring Solr to the Apple iPad so that applications may use a local search
engine?

Greetings,
Stephan

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-on-iPad-tp987655p987655.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr on iPad?

2010-07-22 Thread Andreas Jung
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Stephan Schwab wrote:
 Dear Solr community,
 
 does anyone know whether it may be possible or has already been done to
 bring Solr to the Apple iPad so that applications may use a local search
 engine?

huh?

Solr requires Java. iPad does not support Java.
Solr is memory and cpu intensive...nothing that fits with the concept
of a tablet pc.

- -aj
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxIfp8ACgkQCJIWIbr9KYwQgQCg0p1oiXuPf17/Vg2JEpVHlZql
bLEAoL46mARjhGkHsz30Kv1Agpf2xp6r
=86KI
-END PGP SIGNATURE-


Re: a bug of solr distributed search

2010-07-22 Thread Yonik Seeley
As the comments suggest, it's not a bug, but just the best we can do
for now since our priority queues don't support removal of arbitrary
elements.  I guess we could rebuild the current priority queue if we
detect a duplicate, but that will have an obvious performance impact.
Any other suggestions?

-Yonik
http://www.lucidimagination.com

On Wed, Jul 21, 2010 at 3:13 AM, Li Li fancye...@gmail.com wrote:
 in QueryComponent.mergeIds. It will remove document which has
 duplicated uniqueKey with others. In current implementation, it use
 the first encountered.
          String prevShard = uniqueDoc.put(id, srsp.getShard());
          if (prevShard != null) {
            // duplicate detected
            numFound--;
            collapseList.remove(id+);
            docs.set(i, null);//remove it.
            // For now, just always use the first encountered since we
 can't currently
            // remove the previous one added to the priority queue.
 If we switched
            // to the Java5 PriorityQueue, this would be easier.
            continue;
            // make which duplicate is used deterministic based on shard
            // if (prevShard.compareTo(srsp.shard) = 0) {
            //  TODO: remove previous from priority queue
            //  continue;
            // }
          }

  It iterate ove ShardResponse by
 for (ShardResponse srsp : sreq.responses)
 But the sreq.responses may be different. That is -- shard1's result
 and shard2's result may interchange position
 So when an uniqueKey(such as url) occurs in both shard1 and shard2.
 which one will be used is unpredicatable. But the socre of these 2
 docs are different because of different idf.
 So the same query will get different result.
 One possible solution is to sort ShardResponse srsp  by shard name.



Re: Solr on iPad?

2010-07-22 Thread mbklein

Hi Stephan,

On a lark, I hacked up solr running under a small-footprint servlet engine
on my jailbroken iPad. You can see the console here: http://imgur.com/tHRh3

It's not a particularly practical solution, though, since Apple would never
approve a Java-based app for the App Store. Or a background service, for
that matter. So it would only ever run on a jailbroken iPad. Even if you're
willing to live with that, keeping the process running in the background all
the time would have a devastating impact on battery life.

Michael

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-on-iPad-tp987655p987716.html
Sent from the Solr - User mailing list archive at Nabble.com.


Providing token variants at index time

2010-07-22 Thread Paul Dlug
Is there a tokenizer that supports providing variants of the tokens at
index time? I'm looking for something that could take a syntax like:

International|I Business|B Machines|M

Which would take each pipe delimited token and preserve its position
so that phrase queries work properly. The above would result in
queries for International Business Machines as well as I B M or
any variants. The point is that the variants would be generated
externally as part of the indexing process so they may not be as
simple as the above.

Any ideas or do I have to write a custom tokenizer to do this?


Thanks,
Paul


calling other core from request handler

2010-07-22 Thread Kevin Osborn
I have a multi-core environment and a custom request handler. However, I have 
one place where I would like to have my request handler on coreA query to 
coreB. 
This is not distributed search. This is just an independent query to get some 
additional data.

I am also guaranteed that each server will have the same core set. I am also 
guaranteed that I will not be reloading cores (just indexes).

It looks I can 
call coreA.getCoreDescriptor().getCoreContainer().getCore(coreB); and then 
get 
the Searcher and release it when I am done.

Is there a better way?

And it also appears that during the inform or init methods of my 
requestHandler, 
coreB is NOT guaranteed to already exist?


  

Re: Providing token variants at index time

2010-07-22 Thread Jonathan Rochkind
I think the Synonym filter should actually do exactly what you want, no? 


http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Hmm, maybe not exactly what you want as you describe it. It comes close, 
maybe good enough. Do you REALLY need to support I Business M or I B 
Machines as source/query? Your spec suggests yes, synonym filter won't 
easily do that.But if you just want International Business Machines == 
IBM, keeping positions intact for subsequent terms, I think synonym 
filter will do it. 

If not, I suppose you could look at it's source to write your own. Or 
maybe there's some way to combine the PositionFilter with something else 
to do it, but I can't figure one out.


Jonathan

Paul Dlug wrote:

Is there a tokenizer that supports providing variants of the tokens at
index time? I'm looking for something that could take a syntax like:

International|I Business|B Machines|M

Which would take each pipe delimited token and preserve its position
so that phrase queries work properly. The above would result in
queries for International Business Machines as well as I B M or
any variants. The point is that the variants would be generated
externally as part of the indexing process so they may not be as
simple as the above.

Any ideas or do I have to write a custom tokenizer to do this?


Thanks,
Paul

  


Re: a bug of solr distributed search

2010-07-22 Thread Chris Hostetter

: As the comments suggest, it's not a bug, but just the best we can do
: for now since our priority queues don't support removal of arbitrary

FYI: I updated the DistributedSearch wiki to be more clear about this -- 
it previously didn't make it explicitly clear that docIds were suppose to 
be unique across all shards, and suggested that there was specific well 
definied behavior when they weren't.


-Hoss



Re: Providing token variants at index time

2010-07-22 Thread Paul Dlug
On Thu, Jul 22, 2010 at 4:01 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 I think the Synonym filter should actually do exactly what you want, no?
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

 Hmm, maybe not exactly what you want as you describe it. It comes close,
 maybe good enough. Do you REALLY need to support I Business M or I B
 Machines as source/query? Your spec suggests yes, synonym filter won't
 easily do that.But if you just want International Business Machines ==
 IBM, keeping positions intact for subsequent terms, I think synonym filter
 will do it.
 If not, I suppose you could look at it's source to write your own. Or maybe
 there's some way to combine the PositionFilter with something else to do it,
 but I can't figure one out.

The synonym approach won't work as I need to provide them in a file.
The variants may be more dynamic and not known in advance, the process
creating the documents to index does have that logic and could easily
put them into the document in a format a tokenizer could pull apart
later.


--Paul


RE: Finding distinct unique IDs in documents returned by fq -- Urgent Help Req

2010-07-22 Thread Chris Hostetter

:  I would like get the total count of the facet.field response values
: 
: I'm pretty sure there's no way to get Solr to do that -- other than not 
: setting a facet.limit, getting every value back in the response, and 
: counting them yourself (not feasible for very large counts).  I've 
: looked at trying to patch Solr to do it, because I could really use it 
: too; it's definitely possible, but made trickier because there are now 
: several different methods that Solr can use to do facetting, with 
: separate code paths.  It seems like an odd omission to me too.

beyond just having multiple facet algorithms for perforamance making it 
difficult to add this feature, the other issue is hte perforamce of 
computing the number:  in some algorithms it's relatively cheap (on a 
single server) but in others it's more expensive then computing the facet 
counts being returned (consider the case where we are sorting in term 
order - once we have collected counts for ${facet.limit} constraints, we 
can stop iterating over terms -- but to compute the total umber of 
constraints (ie: terms) we would have to keep going and test every one of 
them against ${facet.mincount})

With distributed searching it becomes even more prohibitive -- your 
description of using an infinite facet.limit and asking for every value 
back to count them is exactly what would have to be done internally in a 
distributed faceting situation -- except they couldn't just be counted, 
they'd have to be deduped and then counted)

To do this efficiently, other data structures (denormalized beyond just 
the inverted index level) would need to be built.

-Hoss



Re: stats on a field with no values

2010-07-22 Thread Chris Hostetter
: 
: When I use the stats component on a field that has no values in the result set
: (ie, stats.missing == rowCount), I'd expect that 'min'and 'max' would be
: blank.
: 
: Instead, they seem to be the smallest and largest float values or something,
: min = 1.7976931348623157E308, max = 4.9E-324 .
: 
: Is this a bug?

off the top of my head it sounds like it ... would you mind opening a n 
issue in Jira please?


-Hoss



Re: How to get the list of all available fields in a (sharded) index

2010-07-22 Thread Chris Hostetter

: I cannot find any info on how to get the list of current fields in an index
: (possibly sharded). With dynamic fields, I cannot simply parse the schema to

there isn't one -- the LukeRequestHandler can tell you what fields 
*actually* exist in your index, but you'd have to query it on each shard 
to know the full set of concrete fields in the entire distributed index.



-Hoss



Re: about warm up

2010-07-22 Thread Chris Hostetter

: I want to load full text into an external cache, So I added so codes
: in newSearcher where I found the warm up takes place. I add my codes

...

: public void newSearcher(SolrIndexSearcher newSearcher,
: SolrIndexSearcher currentSearcher) {
: warmTextCache(newSearcher,warmTextCache,new String[]{title,content});

...

: in warmTextCache I need a reader to get some docs

...

: So I need a reader, When I contruct a reader by myself like:
: IndexReader reader=IndexReader.open(...);
: Or by core.getSearcher().get().getReader()

Don't do that -- the readers/searchers are refrenced counted by the 
SolrCore, so unless you release your refrences cleanly all you are 
likely to get into some interesting situations

the newSearcher method you are implementing directly gives you the 
SolrIndexSearcher (the newSearcher arg) that will be used along with 
your cache .  why don't you use it to get the reader (the 
getReader() method)instead of jumping through these hoops you've been 
trying? 

 



-Hoss



commit is taking very very long time

2010-07-22 Thread bbarani

Hi,

I am not sure why some commits take very long time. I have a batch indexing
which commits just once after it completes the indexing.

I tried to index just 36 rows but the total time taken to index was like 12
minutes. The indexing time was very less just some 30 seconds but it took
the remaining time for commit.


response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
−
lst name=initArgs
−
lst name=defaults
str name=configdataimportHydrogen.xml/str
/lst
/lst
str name=statusidle/str
str name=importResponse/
−
lst name=statusMessages
str name=Total Requests made to DataSource4/str
str name=Total Rows Fetched36/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2010-07-22 15:42:28/str
−
str name=
Indexing completed. Added/Updated: 4 documents. Deleted 0 documents.
/str
str name=Committed2010-07-22 15:54:49/str
str name=Optimized2010-07-22 15:54:49/str
str name=Total Documents Processed4/str
str name=Time taken 0:12:21.632/str
/lst
−
str name=WARNING
This response format is experimental.  It is likely to change in the future.
/str
/response


I even set the autowarm count to 0 in solrconfig.xml file but of non use.
Any reason why the commit takes more time?

Also is there a way to reduce the time it takes?

I have attached my solrconfig / log for your reference.

http://lucene.472066.n3.nabble.com/file/n988220/SOLRerror.log SOLRerror.log 
http://lucene.472066.n3.nabble.com/file/n988220/solrconfig.xml
solrconfig.xml 

Thanks,
BB


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/commit-is-taking-very-very-long-time-tp988220p988220.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Novice seeking help to change filters to search without diacritics

2010-07-22 Thread Chris Hostetter

: I am new to Solr and seeking your help to change filter from
: ISOLatin1AccentFilterFactory to ASCIIFoldingFilterFactory files.  I am not

According to the files you posted, you aren't using the 
ISOLatin1AccentFilterFactory -- so problem solved w/o making any changes.

: sure what change is to be made and where exactly this change is to be made.
: And finally, what would replace mapping-ISOLatin1Accent.txt file?  I would

i think what's confusing you is thta you are using the 
MappingCharFilterFactory with that file in your text field type to 
convert any ISOLatin1Accent characters to their base characters (i'm 
sure there is a more precise term for it, but i'm not a charset savant 
like rmuir so i odn't know what it's caled)

: like Solr to search both with and without diacritics found in
: transliteration of Indian languages with characters such as Ā ś ṛ ṇ, etc. 

your existing usage should allow that on any fields using the text type 
-- if you index those characters they will get flattened and if someone 
searches on those characters they will get flattened -- it's just like 
using LowerCaseFilter -- as long as you do it at index and query time 
everything is consistent.

if you want docs to score higher when even the accents match, just index 
and query across two fields: on with that charfilter and one w/o.



-Hoss


Re: Providing token variants at index time

2010-07-22 Thread Jonathan Rochkind

Paul Dlug wrote:

On Thu, Jul 22, 2010 at 4:01 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
  


The synonym approach won't work as I need to provide them in a file.
The variants may be more dynamic and not known in advance, the process
creating the documents to index does have that logic and could easily
put them into the document in a format a tokenizer could pull apart
later.
Then maybe look at the source code of the synonyms file, and build your 
own filter, copying the parts that do the real work (or even 
sub-classing), but instead of using a file, using the transient state 
information that is for some reason only available at indexing time?


Don't entirely understand your use case, if you give some more explicit 
examples, others might have other ideas.


Joanthan


Re: Finding distinct unique IDs in documents returned by fq -- Urgent Help Req

2010-07-22 Thread Jonathan Rochkind

Chris Hostetter wrote:
computing the number:  in some algorithms it's relatively cheap (on a 
single server) but in others it's more expensive then computing the facet 
counts being returned (consider the case where we are sorting in term 
order - once we have collected counts for ${facet.limit} constraints, we 
can stop iterating over terms -- but to compute the total umber of 
constraints (ie: terms) we would have to keep going and test every one of 
them against ${facet.mincount})
  
I've been told this before, but it still doesn't really make sense to 
me.  How can you possibly find the top N constraints, without having at 
least examined all the contraints?  How do you know which are the top N 
if there are some you haven't looked at? And if you've looked at them 
all, it's no problem to increment at a counter as you look at each one.  
Although I guess the facet.minCount test does possibly put a crimp in 
things, I don't ever use that param myself to be something other than 1, 
so hadn't considered it.


But I may be missing something. I've examined only one of the code 
paths/methods for faceting in source code, the one (if my reading was 
correct) that ends up used for high-cardinality multi-valued fields -- 
in that method, it looked like it should add no work at all to give you 
a facet unique value (result set value cardinality) count. (with 
facet.mincount of 1 anyway).  But I may have been mis-reading, or it may 
be that other methods are more troublesome.


At any rate, if I need it bad enough, I'll try to write my own facet 
component that does it (perhaps a subclass of the existing SimpleFacet), 
and see what happens.  It does seem to be something a variety of 
people's use cases could use, I see it mentioned periodically in the 
list serv archives.


Jonathan




WordDelimiterFilter and phrase queries?

2010-07-22 Thread Drew Farris
Hi All,

A question about the WordDelimiterFilter and position increments /
phrase queries:

I have a string like: 3-diphenyl-propanoic

When indexed gets it is broken up into the following tokens:

pos token offset
1 3 0-1
2 diphenyl 2-10
3 propanoic 11-20
3 diphenylpropanoic 2-20

The WordDelimiterFilter has catenateWords set to 1, which causes it to
emit 'diphenylpropanoic'. Note that position for this term is '3'.
(catentateAll is set to 0)

Say someone enters the query string 3-diphenylpropanoic

The query parser I'm using transforms this into a phrase query and the
indexed form is missed because based the positions of the terms '3'
and 'diphenylpropanoic' indicate they are not adjacent?

Is this intended behavior? I expect that the catenated word
'diphenylpropanoic' should have a position of 2 based on the position
of the first term in the concatenation, but perhaps I'm missing
something. This seems to be present in both 1.4.1 and the current
trunk.

- Drew


Re: calling other core from request handler

2010-07-22 Thread Chris Hostetter
: It looks I can 
: call coreA.getCoreDescriptor().getCoreContainer().getCore(coreB); and then 
get 
: the Searcher and release it when I am done.
: 
: Is there a better way?

not really ... not unless you want to do it via HTTP to localhost

: And it also appears that during the inform or init methods of my 
requestHandler, 
: coreB is NOT guaranteed to already exist?

correct ... your RequestHandler shouldn't make any assumptions about the 
order that core's are initialized in.

-Hoss



Duplicates

2010-07-22 Thread Pavel Minchenkov
Hi,

Is it possible to remove duplicates in search results by a given field?

Thanks.

-- 
Pavel Minchenkov


Re: Finding distinct unique IDs in documents returned by fq -- Urgent Help Req

2010-07-22 Thread Chris Hostetter
:  being returned (consider the case where we are sorting in term order - once
:  we have collected counts for ${facet.limit} constraints, we can stop
:  iterating over terms -- but to compute the total umber of constraints (ie:
:  terms) we would have to keep going and test every one of them against
:  ${facet.mincount})
:
: I've been told this before, but it still doesn't really make sense to me.  How
: can you possibly find the top N constraints, without having at least examined
: all the contraints?  How do you know which are the top N if there are some you

that's exactly my point: in the scenerio where you've asked for 
facet.mincount=Nfacet.limit=Mfacet.sort=index you don't have to find hte 
top constraints, you just have to find the first M terms in index order 
that have a mincount of N.

: But I may be missing something. I've examined only one of the code
: paths/methods for faceting in source code, the one (if my reading was correct)
: that ends up used for high-cardinality multi-valued fields -- in that method,
: it looked like it should add no work at all to give you a facet unique value
: (result set value cardinality) count. (with facet.mincount of 1 anyway).  But
: I may have been mis-reading, or it may be that other methods are more
: troublesome.

in any case where you ar sorting by *counts* then yes, all of the 
constraints have to be checked, so you can count them as you go -- but 
that doesn't scale in distributed faceting, you can't just add the counts 
up from each shard because you don't know what the overlap is -- hence my 
comment about how to dedup them.

there are some simple usecases where it's feasible, but in general it's a 
very hard problem.


-Hoss



Re: boosting particular field values

2010-07-22 Thread Chris Hostetter

I blieve this cam up on IRC, and the end result wsa that the bq was 
working fine, Justin just wasn't noticing because he added it to his 
solrconfig.xml (and not to the query URL) and his browser was still 
caching the page -- so he didn't see his boost affect anything)

(but i may be confusing justin with someone else)

: I'm using dismax request handler, solr 1.4.
: 
: I would like to boost the weight of certain fields according to their
: values... this appears to work:
: 
: bq=category:electronics^5.5
: 
: However, I think this boosting only affects sorting the results that
: have already matched? So if I only get 10 rows back, I might not get
: any records back that are category electronics. If I get 100 rows, I
: can see that bq is working. However, I only want to get 10 rows.
: 
: How does one affect the kinds of results that are matched to begin
: with? bq is the wrong thing to use, right?
: 
: Thanks for any help,
: Justin
: 



-Hoss



Re: Clustering results limit?

2010-07-22 Thread Darren Govoni
Yeah, my results count is 151 and only 21 documents appear in 6
clusters.

This is true whether I use URL or SolrJ.

When I use carrot workbench and point to my Solr using local clustering,
the workbench
has numerous clusters and all documents are placed

On Thu, 2010-07-22 at 18:06 +0200, Stanislaw Osinski wrote:

 Hi,
 
 In my SolrJ, I used ModifiableSolrParams and I set (rows,50) but it
  still returns less than 10 for each cluster.
 
 
 Oh, the number of documents per cluster very much depends on the
 characteristics of your documents, it often happens that the algorithms
 create larger numbers of smaller clusters. However, all returned documents
 should get assigned to some cluster(s), the Other Topics one in the worst
 case. Does that hold in your case?
 
 If you'd like to tune clustering a bit, you can try Carrot2 tools:
 
 http://download.carrot2.org/stable/manual/#section.getting-started.solr
 
 and then:
 
 http://download.carrot2.org/stable/manual/#chapter.tuning
 
 Cheers,
 
 S.




Re: Duplicates

2010-07-22 Thread Erick Erickson
If the field is a single token, just define the uniqueKey on it in your
schema.

Otherwise, this may be of interest:
http://wiki.apache.org/solr/Deduplication

Haven't used it myself though...

best
Erick

On Thu, Jul 22, 2010 at 6:14 PM, Pavel Minchenkov char...@gmail.com wrote:

 Hi,

 Is it possible to remove duplicates in search results by a given field?

 Thanks.

 --
 Pavel Minchenkov



DIH stalling, how to debug?

2010-07-22 Thread Tommy Chheng

 Hi,
When I run my DIH script, it says it's busy but the Total Requests 
made to DataSource and Total Rows Fetched remain unchanged at 4 and 
6. It hasn't reported a failure.


How can I debug what is blocking the DIH?

--

@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com



Re: DIH stalling, how to debug?

2010-07-22 Thread Tommy Chheng

 Ok, it was a runaway SQL query which isn't using an index.

@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com


On 7/22/10 4:26 PM, Tommy Chheng wrote:

 Hi,
When I run my DIH script, it says it's busy but the Total Requests 
made to DataSource and Total Rows Fetched remain unchanged at 4 and 
6. It hasn't reported a failure.


How can I debug what is blocking the DIH?



Re: filter query on timestamp slowing query???

2010-07-22 Thread Chris Hostetter

: You are correct, first of all i haven't move yet to the TrieDateField, but i
: am still waiting to find out a bit more information about it, and there's
: not a lot of info, other then in the xml file.

In general TrieFields are a way of trading disk space for range query 
speed.  they are explained fairly well if you look at the docs...

http://lucene.apache.org/solr/api/org/apache/solr/schema/TrieField.html
http://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/NumericRangeQuery.html

...allthough i realize now that TrieDateField's docs don't actually 
link to TrieField where the explanation is provided.

AS for your usecase...

: I'll explain my use case, so you'll know a bit more. I have an  index that's
: being updated regularly, (every second i have 10 to 50 new documents, most
: of them are small)
: 
: Every 30 minutes, i ask the index what are the documents that were added to
: it, since the last time i queried it, that match a certain criteria.
: From time to time, once a week or so, i ask the index for ALL the documents
: that match that criteria. (i also do this for not only one query, but
: several)
: This is why i need the timestamp filter.
: 
: The queries that don't have any time range, take a few seconds to finish,
: while the ones with time range, take a few minutes.
: Hope that helps understanding my situation, and i am open to any suggestion
: how to change the way things work, if it will improve performance.

you keep saying you run simple queries and gave an example of 
myStrField:foo and you say you ask the index what are the documents 
that were added to it, since the last time i queried it ... but you've 
never given any concrete example of a full Solr request that incorporates 
these timestamp filtering so we can see *exactly* what your requests look 
like.  Even with an index the size you are describing, and even with the 
slower performance of DateField compared to TreiDateField i find it hard 
to believe that a query for myStrField:foo would go fro ma few seconds 
to several minutes by adding an fq range query for a span of ~30 minutes.  
are you by any chance also *sorting* the documents by that timestamp field 
when you do this?

My best guess is that either:

  a) your raw query performance is generally really bad, but you don't 
notice when you do your simple queries because of solr's 
queryResultCache -- but this can't be used when you add the fq so you see 
the bad performance then.  If this is the situation I have no real 
suggestions

  b) when you do your individual requests that filter by your timestamp 
field you are also sorting by your timestamp field -- a field you don't 
ever sort on in any other queries so the filterCache needed for sorting 
needs to be built before those queries can be returned.  if you stop 
sorting onthis timestamp field (or add a newSearcher warming query that 
does the same sort) then the problem should go away.



-Hoss



Re: Novice seeking help to change filters to search without diacritics

2010-07-22 Thread HSingh

Hoss, thank you for your helpful response!

: i think what's confusing you is that you are using the
: MappingCharFilterFactory with that file in your text field type to
: convert any ISOLatin1Accent characters to their base characters

The problem is that a large range of characters are not getting converting
to their base characters.  The ASCIIFoldingFilterFactory handles this
conversion for the entire Latin character set, including the extended sets
without having to specify individual characters and their equivalent base
characters.

Is there way for me to switch to ASCIIFoldingFilterFactory?  If so, what
changes do I need to make to these files?  I would appreciate your help!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Novice-seeking-help-to-change-filters-to-search-without-diacritics-tp971263p988890.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tree Faceting in Solr 1.4

2010-07-22 Thread rajini maski
I am also looking out for same feature in Solr and very keen to know whether
it supports this feature of tree faceting... Or we are forced to index in
tree faceting formatlike

1/2/3/4
1/2/3
1/2
1

In-case of multilevel faceting it will give only 2 level tree facet is what
i found..

If i give query as : country India and state Karnataka and city
bangalore...All what i want is a facet count  1) for condition above. 2) The
number of states in that Country 3) the number of cities in that state ...

Like = Country: India ,State:Karnataka , City: Bangalore 1

 State:Karnataka
  Kerla
  Tamilnadu
  Andra Pradesh...and so on

 City:  Mysore
  Hubli
  Mangalore
  Coorg and so on...


If I am doing
facet=on  facet.field={!ex=State}State  fq={!tag=State}State:Karnataka

All it gives me is Facets on state excluding only that filter query.. But i
was not able to do same on third level ..Like  facet.field= Give me the
counts of  cities also in state Karantaka..
Let me know solution for this...

Regards,
Rajani Maski





On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler impalah...@googlemail.comwrote:

 Thank you for the link.

 I was not aware of the multifaceting syntax - this will enable me to run 1
 less query on the main page!

 However this is not a tree faceting feature.

 Thanks
 Eric




 On Thu, Jul 22, 2010 at 4:51 PM, SR r.steve@gmail.com wrote:

  Perhaps the following article can help:
 
 http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
 
  -S
 
 
  On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote:
 
   Hi Solr Community
  
   If I have:
   COUNTRY CITY
   Germany Berlin
   Germany Hamburg
   Spain   Madrid
  
   Can I do faceting like:
   Germany
Berlin
Hamburg
   Spain
Madrid
  
   I tried to apply SOLR-792 to the current trunk but it does not seem to
 be
   compatible.
   Maybe there is a similar feature existing in the latest builds?
  
   Thanks  Regards
   Eric