Re: Use of solr.ASCIIFoldingFilterFactory

2010-02-08 Thread Yann PICHOT
Hello,

Thank's, your response solve my problem.

Thank's for all,

On Sun, Feb 7, 2010 at 4:00 PM, Sven Maurmann sven.maurm...@kippdata.dewrote:

 Hi,

 you might have run into an encoding problem. If you use Tomcat as
 the container for Solr you should probably consult the following


  http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

 Cheers,
Sven



 --On Freitag, 5. Februar 2010 15:41 +0100 Yann PICHOT ypic...@gmail.com
 wrote:

  Hi,

 I have define this type in my schema.xml file :

fieldType name=text class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.LowerCaseFilterFactory /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

 Fields definition :

  fields
field name=id type=string indexed=true stored=true
 required=true /
field name=idProd type=string indexed=false stored=false
 required=false /
field name=description type=text indexed=true stored=true
 required=false /
field name=artiste type=text indexed=true stored=true
 required=false /
field name=collection type=text indexed=true stored=true
 required=false /
field name=titre type=text indexed=true stored=true
 required=false /
field name=all type=text indexed=true stored=true
 required=false /
  /fields

  copyField source=description dest=all/
  copyField source=collection dest=all/
  copyField source=artiste dest=all/
  copyField source=titre dest=all/

 I have import my documents with DataImportHandler (my orginals documents
 are in RDBMS).

 I test query this query string  on SOLR web application : all:chateau.
 Results (content of the field all)  :
  CHATEAU D'AMBOISE
  [CHATEAU EN FRANCE, BABELON]
  ope dvd rene chateau
  CHATEAU DE LA LOIRE
  DE CHATEAU EN CHATEAU ENTRE LA LOIRE ET LE CHER
  [LE CHATEAU AMBULANT, HAYAO MIYAZAKI]
  [Chambres d'hôtes au château, Moreau]
  [ARCHIMEDE, LA VIE DE CHATEAU, KRAHENBUHL]
  [NEUF, NAISSANCE D UN CHATEAU FORT, MACAULAY]
  [ARCHIMEDE, LA VIE DE CHATEAU, KRAHENBUHL]

 Now i try this query string : all:château.
 No result :(

 I don't understand. I think the second query respond the same result of
 the first query but it is not the case.

 I use SOLR 1.4 (Solr Implementation Version: 1.4.0 833479 -
 grantingersoll - 2009-11-06 12:33:40).
 Java 32 bits : Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
 OS : Windows Seven 64 bits

 Regards,
 --
 Yann




-- 
Yann


DataImportHandler - case sensitivity of column names

2010-02-08 Thread Alexey Serba
I encountered the problem with Oracle converting column names to upper
case. As a result SolrInputDocument is created with field names in
upper case and Document [null] missing required field: id exception
is thrown ( although ID field is defined ).

I do not specify field elements explicitly.

I know that I can rewrite all my queries to select id as id, body
as body from document format, but is there any other workaround for
this? case insensitive option or something?

Here's my data-config:
dataConfig
  dataSource convertType=true
driver=oracle.jdbc.driver.OracleDriver password=oracle
url=jdbc:oracle:thin:@localhost:1521:xe user=SYSTEM/
  document name=items
entity name=root pk=id preImportDeleteQuery=db:db1
query=select id, body from document
transformer=TemplateTransformer
  entity name=nested1 query=select category from
document_category where doc_id='${root.id}'/
  entity name=nested2 query=select tag from document_tag where
doc_id='${root.id}'/
  field column=db template=db1/
/entity
  /document
/dataConfig

Alexey


How to configure multiple data import types

2010-02-08 Thread stefan.maric
I have got a dataimport request handler configured to index data by selecting 
data from a DB view 

I now need to index additional data sets from other views so that I can support 
other search queries

I defined additional entity .. definitions within the document ..  section 
of my data-config.xml
But I only seem to pull in data for the 1st entity ..  and not both


Is there an xsd (or dtd) for 
data-config.xml
schema.xml
slrconfig.xml

As these might help with understanding how to construct usable conf files

Regards
Stefan Maric 
BT Innovate  Design | Collaboration Platform - Customer Innovation Solutions


Re: SV: Running Solr (LucidWorks) as a Windows Server

2010-02-08 Thread Ron Chan
assuming you have the example running from example folder in the standard 
distribution by doing java -jar start.jar 

this is what I did to get the same running as a service 

download the jetty distribution (I used 6.1.21) 

copy the bin folder over to example 
copy etc\jetty-win32-service.xml to example\etc 
copy lib\win32 folder over to example\lib 

in example\bin\jetty-service.conf 
add this after wrapper.java.additional.2=-Djetty.logs=../logs 
wrapper.java.additional.3=-Dsolr.solr.home=../solr 

in example\solr\conf 
change this line 
dataDir${solr.data.dir:./solr/data}/dataDir 
to 
dataDir${solr.data.dir:../solr/data}/dataDir 
(double .. instead of single . before /solr ) 

then as administrator from cmd prompt 
in the bin folder do 

Jetty-Service.exe --install jetty-service.conf 
net start jetty6-service 

HTH 
Ron 

- Original Message - 
From: Roland Villemoes r...@alpha-solutions.dk 
To: solr-user@lucene.apache.org 
Sent: Friday, 5 February, 2010 6:07:03 PM 
Subject: SV: Running Solr (LucidWorks) as a Windows Server 

Hi All, 

Thanks a lot for your help in this. 

I have tried to use the Win32Wrapper, and the Jetty-Service.exe but still no 
success. 
I was actually hoping the some of you guys out there actually had a running 
copy so I could so how to configure it? 

Looks like it must go the Tomcat way... 

Roland 

-Oprindelig meddelelse- 
Fra: Ron Chan [mailto:rc...@i-tao.com] 
Sendt: 5. februar 2010 12:55 
Til: solr-user@lucene.apache.org 
Emne: Re: Running Solr (LucidWorks) as a Windows Server 

jetty can be run as a Windows Service, see 

http://docs.codehaus.org/display/JETTY/Win32Wrapper 


- Original Message - 
From: Roland Villemoes r...@alpha-solutions.dk 
To: solr-user@lucene.apache.org 
Sent: Thursday, 4 February, 2010 7:18:57 PM 
Subject: Running Solr (LucidWorks) as a Windows Server 

Hi, 

I need to have Solr/Jetty running as a Windows Service. 
I am using the Lucid distribution. 

Does anyone have a running example and tool for this? 


med venlig hilsen/best regards 




Re: How to configure multiple data import types

2010-02-08 Thread Noble Paul നോബിള്‍ नोब्ळ्
are you referring to nested entities?
http://wiki.apache.org/solr/DIHQuickStart#Index_data_from_multiple_tables_into_Solr

On Mon, Feb 8, 2010 at 5:42 PM,  stefan.ma...@bt.com wrote:
 I have got a dataimport request handler configured to index data by selecting 
 data from a DB view

 I now need to index additional data sets from other views so that I can 
 support other search queries

 I defined additional entity .. definitions within the document ..  
 section of my data-config.xml
 But I only seem to pull in data for the 1st entity ..  and not both


 Is there an xsd (or dtd) for
        data-config.xml
        schema.xml
        slrconfig.xml

 As these might help with understanding how to construct usable conf files

 Regards
 Stefan Maric
 BT Innovate  Design | Collaboration Platform - Customer Innovation Solutions




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


RE: How to configure multiple data import types

2010-02-08 Thread stefan.maric
No my views have already taken care of pulling the related data together 

I've indexed my first data set and now want to configure a second (non-related) 
data set so that a User can issue a query for data set #1 whilst another user 
might be querying for data set #2

Should I be defining multiple document .. or entity .. entries
Or what ??

Thanks
Stefan Maric 


Re: How to configure multiple data import types

2010-02-08 Thread Shalin Shekhar Mangar
On Mon, Feb 8, 2010 at 6:03 PM, stefan.ma...@bt.com wrote:

 No my views have already taken care of pulling the related data together

 I've indexed my first data set and now want to configure a second
 (non-related) data set so that a User can issue a query for data set #1
 whilst another user might be querying for data set #2

 Should I be defining multiple document .. or entity .. entries
 Or what ??


You can define multiple entities (all at the root level) to import all your
views at once.

-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler - case sensitivity of column names

2010-02-08 Thread Shalin Shekhar Mangar
On Mon, Feb 8, 2010 at 3:59 PM, Alexey Serba ase...@gmail.com wrote:

 I encountered the problem with Oracle converting column names to upper
 case. As a result SolrInputDocument is created with field names in
 upper case and Document [null] missing required field: id exception
 is thrown ( although ID field is defined ).

 I do not specify field elements explicitly.

 I know that I can rewrite all my queries to select id as id, body
 as body from document format, but is there any other workaround for
 this? case insensitive option or something?

 Here's my data-config:
 dataConfig
  dataSource convertType=true
 driver=oracle.jdbc.driver.OracleDriver password=oracle
 url=jdbc:oracle:thin:@localhost:1521:xe user=SYSTEM/
  document name=items
entity name=root pk=id preImportDeleteQuery=db:db1
 query=select id, body from document
 transformer=TemplateTransformer
  entity name=nested1 query=select category from
 document_category where doc_id='${root.id}'/
  entity name=nested2 query=select tag from document_tag where
 doc_id='${root.id}'/
  field column=db template=db1/
/entity
  /document
 /dataConfig


Fields are imported in a case-insensitive manner as long as they are not
specified explicitly. In this case, however, the problem is that the ${
root.id} is case sensitive. There is no way right now to resolve variables
in a case-insensitive manner.

-- 
Regards,
Shalin Shekhar Mangar.


Request time out in solr

2010-02-08 Thread Vijayant Kumar
Hi
I had indexed the solr index by DIH,
I am using Webservice::solr perl Module to update/delte my solr index at
run time from frontend.

I want to know How can I set request timeout through perl by
webservice::solr end or solr end so that I could hanlde request timeout
exception.


-- 

Thank you,
Vijayant Kumar
Software Engineer
Website Toolbox Inc.
http://www.websitetoolbox.com
1-800-921-7803 x211



Multi-word synonyms containing commas

2010-02-08 Thread Agethle, Matthias
Hi,

is it possible to have a synonym file where single synonyms can also contain 
commas, e.g. names like Washington, George.
Perhaps it would suffice to tell the SynonymFilterFacotry to use another 
separator character (instead of the comma)?
I tried this and changed the line where the parseRules-method is called in the 
original implementation of SynonymFilterFactory (simply replacing , with#),
but this didn't work as expected.

Thanks
Matthias



unloading a solr core doesn't free any memory

2010-02-08 Thread Tim Terlegård
To me it doesn't look like unloading a Solr Core frees the memory that
the core has used. Is this how it should be?

I have a big index with 50 million documents. After loading a core it
takes 300 MB RAM. After a query with a couple of sort fields Solr
takes about 8 GB RAM. Then I unload (CoreAdminRequest.unloadCore) the
core. The core is not shown in /solr/ anymore. Solr still takes 8 GB
RAM. Creating new cores is super slow because I have hardly any memory
left. Do I need to free the memory explicitly somehow?

/Tim


RE: How to configure multiple data import types

2010-02-08 Thread Ken Lane (kenlane)
It sounds like you are doing it correctly, Stefan. Must be something
syntactical. The schema.xml and solrconfig.xml does not factor into your
problem, only the data-config.

I do the same thing you are trying to do. A watered down version is:

dataConfig
  dataSource type=JdbcDataSource 
  name=bdb-1 
  driver=oracle.jdbc.driver.OracleDriver
  url=jdbc:oracle:thin:@(DESCRIPTION = (LOAD_BALANCE = on)
(FAILOVER = on) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST =
server.domain.com)(PORT = 1528))) (CONNECT_DATA = (SERVICE_NAME =
instance.domain.COM)))
  user=scott 
  password=tiger/
  document name=monitors
entity name=bdbmon dataSource=bdb-1 query=SELECT column from
table
/entity 
 entity name=bug  dataSource=bdb-1  
query=SELECT another_column from another_table
  /entity
  /document
/dataConfig

Hope this helps...

-Original Message-
From: stefan.ma...@bt.com [mailto:stefan.ma...@bt.com] 
Sent: Monday, February 08, 2010 7:34 AM
To: solr-user@lucene.apache.org; noble.p...@gmail.com
Subject: RE: How to configure multiple data import types

No my views have already taken care of pulling the related data together


I've indexed my first data set and now want to configure a second
(non-related) data set so that a User can issue a query for data set #1
whilst another user might be querying for data set #2

Should I be defining multiple document .. or entity .. entries
Or what ??

Thanks
Stefan Maric 


trouble with DTD

2010-02-08 Thread Jens Kapitza

hi @all,

using solr and dataimport stuff to import ends up in RuntimeException.

Caused by: java.lang.RuntimeException: 
[com.ctc.wstx.exc.WstxLazyException] 
com.ctc.wstx.exc.WstxParsingException: Undeclared general entity eacute

 at [row,col {unknown-source}]: [49,23]

Browsing the code shows that DTD is disabled is there a other chance to 
get entity parsing work?

i'm the only one with entity usage in XML?

i'm trying to import DBLP XML data to solr.

--
Jens Kapitza


Re: trouble with DTD

2010-02-08 Thread Erick Erickson
Are you sure this isn't just a typo? eacute - execute?

On Mon, Feb 8, 2010 at 9:15 AM, Jens Kapitza
j.kapi...@schwarze-allianz.dewrote:

 hi @all,

 using solr and dataimport stuff to import ends up in RuntimeException.

 Caused by: java.lang.RuntimeException: [com.ctc.wstx.exc.WstxLazyException]
 com.ctc.wstx.exc.WstxParsingException: Undeclared general entity eacute
  at [row,col {unknown-source}]: [49,23]

 Browsing the code shows that DTD is disabled is there a other chance to get
 entity parsing work?
 i'm the only one with entity usage in XML?

 i'm trying to import DBLP XML data to solr.

 --
 Jens Kapitza



Re: trouble with DTD

2010-02-08 Thread gwk

On 2/8/2010 3:15 PM, Jens Kapitza wrote:

hi @all,

using solr and dataimport stuff to import ends up in RuntimeException.

Caused by: java.lang.RuntimeException: 
[com.ctc.wstx.exc.WstxLazyException] 
com.ctc.wstx.exc.WstxParsingException: Undeclared general entity eacute

 at [row,col {unknown-source}]: [49,23]

eacute; is an entity defined for (X)HTML. XML only uses quot; amp; 
apos; lt; gt and #; So if you want to use the é character you'll 
have to either use the character itself or something like #x00c9;


Regards,

gwk



Dynamic fields with more than 100 fields inside

2010-02-08 Thread Xavier Schepler

Hey,

I'm thinking about using dynamic fields.

I need one or more user specific field in my schema, 
for example, concept_user_*, and I will have maybe more than 200 users 
using this feature.
One user will send and retrieve values from its field. It will then be 
used to filter result.


How would it impact query performance ?

Thanks,

Xavier S.


Re: unloading a solr core doesn't free any memory

2010-02-08 Thread Simon Rosenthal
What Garbage Collection parameters is the JVM using ?   the memory will not
always be freed immediately after an event like unloading a core or starting
a new searcher.

2010/2/8 Tim Terlegård tim.terleg...@gmail.com

 To me it doesn't look like unloading a Solr Core frees the memory that
 the core has used. Is this how it should be?

 I have a big index with 50 million documents. After loading a core it
 takes 300 MB RAM. After a query with a couple of sort fields Solr
 takes about 8 GB RAM. Then I unload (CoreAdminRequest.unloadCore) the
 core. The core is not shown in /solr/ anymore. Solr still takes 8 GB
 RAM. Creating new cores is super slow because I have hardly any memory
 left. Do I need to free the memory explicitly somehow?

 /Tim



Call URL, simply parse the results using SolrJ

2010-02-08 Thread Jason Rutherglen
Sorry for the poorly worded title... For SOLR-1761 I want to pass in a
URL and parse the query response... However it's non-obvious to me how
to do this using the SolrJ API, hence asking the experts here. :)


Indexing / querying multiple data types

2010-02-08 Thread stefan.maric
OK - so I've now got my data-config.xml sorted so that I'm pulling in the 
expected number of indexed documents for my two data sets

So I've defined two entities (name1  name2) and they both make use of the same 
fields  --  I'm not sure if this is a good thing to have done

When I run a query I include qt=name1 (or qt=name2) and am expecting to only 
get the number of results from the appropriate data set --  in fact I'm getting 
the sum total from both

Does the entity name=name1 equate to the query qt=name1

In my solrconfig.xml I have defined two requestHandlers (name1  name2) using 
the common set of fields 

So how do ensure that my query
http://localhost:7001/solr/select/?q=foodqt=name1
or
http://localhost:7001/solr/select/?q=foodqt=name2

Will operate on the correct data set as loaded via the data import  -- entity 
name=name1 or entity name=name2




Thankss
Stefan Maric 
BT Innovate  Design | Collaboration Platform - Customer Innovation Solutions


Re: Multi-word synonyms containing commas

2010-02-08 Thread Ahmet Arslan
 Hi,
 
 is it possible to have a synonym file where single synonyms
 can also contain commas, e.g. names like Washington,
 George.

Sure, you just need to escape that comma. e.g. 
Washington\, George, wg
a\,a = b\,b


  


Re: Call URL, simply parse the results using SolrJ

2010-02-08 Thread Jason Rutherglen
So here's what happens if I pass in a URL with parameters, SolrJ chokes:

Exception in thread main java.lang.RuntimeException: Invalid base
url for solrj.  The base URL must not contain parameters:
http://locahost:8080/solr/main/select?q=videoqt=dismax
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:205)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:180)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:152)
at org.apache.solr.util.QueryTime.main(QueryTime.java:20)


On Mon, Feb 8, 2010 at 9:32 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 Sorry for the poorly worded title... For SOLR-1761 I want to pass in a
 URL and parse the query response... However it's non-obvious to me how
 to do this using the SolrJ API, hence asking the experts here. :)



Re: Call URL, simply parse the results using SolrJ

2010-02-08 Thread Jason Rutherglen
Here's what I did to resolve this:

XMLResponseParser parser = new XMLResponseParser();
URL urlo = new URL(url);
InputStreamReader isr = new
InputStreamReader(urlo.openConnection().getInputStream());
NamedListObject namedList = parser.processResponse(isr);
QueryResponse response = new QueryResponse(namedList, null);

On Mon, Feb 8, 2010 at 10:03 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 So here's what happens if I pass in a URL with parameters, SolrJ chokes:

 Exception in thread main java.lang.RuntimeException: Invalid base
 url for solrj.  The base URL must not contain parameters:
 http://locahost:8080/solr/main/select?q=videoqt=dismax
        at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:205)
        at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:180)
        at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:152)
        at org.apache.solr.util.QueryTime.main(QueryTime.java:20)


 On Mon, Feb 8, 2010 at 9:32 AM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 Sorry for the poorly worded title... For SOLR-1761 I want to pass in a
 URL and parse the query response... However it's non-obvious to me how
 to do this using the SolrJ API, hence asking the experts here. :)




Re: Call URL, simply parse the results using SolrJ

2010-02-08 Thread Ahmet Arslan
 So here's what happens if I pass in a
 URL with parameters, SolrJ chokes:
 
 Exception in thread main java.lang.RuntimeException:
 Invalid base
 url for solrj.  The base URL must not contain
 parameters:
 http://locahost:8080/solr/main/select?q=videoqt=dismax

You can't pass url with parameters to CommonsHttpSolrServer constructor.
You need to create a SolrQuery representing your parameters and vaules.
Your url can be translated into something like:

server = new CommonsHttpSolrServer(http://locahost:8080/solr/main/;);

final SolrQuery query = new SolrQuery();
query.setQueryType(dismax);
query.setQuery(video);

final QueryResponse rsp = server.query(query);






RE: Multi-word synonyms containing commas

2010-02-08 Thread Agethle, Matthias

Ok, that works (now I found it also in the example synonyms-file...) 

But what if I overwrite the synonyms-file after SOLR-startup? 
Is core-reloading the only way to do this?
I think of this steps:

1. Generate new synonym-file
2. Reload core and wait a minute
3. Re-index (as I'm using synonyms at index time)


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Montag, 08. Februar 2010 18:48
To: solr-user@lucene.apache.org
Subject: Re: Multi-word synonyms containing commas

 Hi,
 
 is it possible to have a synonym file where single synonyms can also 
 contain commas, e.g. names like Washington, George.

Sure, you just need to escape that comma. e.g. 
Washington\, George, wg
a\,a = b\,b


  


Re: Call URL, simply parse the results using SolrJ

2010-02-08 Thread Jason Rutherglen
Ahmet,  Thanks, though that isn't quite what I was going for, and it's
resolved besides...

On Mon, Feb 8, 2010 at 10:24 AM, Ahmet Arslan iori...@yahoo.com wrote:
 So here's what happens if I pass in a
 URL with parameters, SolrJ chokes:

 Exception in thread main java.lang.RuntimeException:
 Invalid base
 url for solrj.  The base URL must not contain
 parameters:
 http://locahost:8080/solr/main/select?q=videoqt=dismax

 You can't pass url with parameters to CommonsHttpSolrServer constructor.
 You need to create a SolrQuery representing your parameters and vaules.
 Your url can be translated into something like:

 server = new CommonsHttpSolrServer(http://locahost:8080/solr/main/;);

 final SolrQuery query = new SolrQuery();
 query.setQueryType(dismax);
 query.setQuery(video);

 final QueryResponse rsp = server.query(query);







Trouble parsing XML from replication?command=status

2010-02-08 Thread Jason Rutherglen
Via Firefox on Ubuntu I downloaded the results of
replication?command=status to a file, then wrote a little app to parse
out the XML.  Unfortunately it's not parsing.  I'm wondering if it's
because it's in XML, which nothing in Solr parses (SnapPuller for
example is using javabin).

Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3,1908]
Message: error reading value:LST
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:319)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:240)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:239)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125)
... 3 more


Re: Collating results from multiple indexes

2010-02-08 Thread Jan Høydahl / Cominvent
Hi,

There is no JOIN functionality in Solr. The common solution is either to accept 
the high volume update churn, or to add client side code to build a join 
layer on top of the two indices. I know that Attivio (www.attivio.com) have 
built some kind of JOIN functionality on top of Solr in their AIE product, but 
do not know the details or the actual performance.

Why not open a JIRA issue, if there is no such already, to request this as a 
feature?

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 25. jan. 2010, at 22.01, Aaron McKee wrote:

 
 Is there any somewhat convenient way to collate/integrate fields from 
 separate indices during result writing, if the indices use the same unique 
 keys? Basically, some sort of cross-index JOIN?
 
 As a bit of background, I have a rather heavyweight dataset of every US 
 business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours to 
 fully index on a decent box). Given the size and relatively stability of the 
 dataset, I generally only update this monthly. However, I have separate 
 advertising-related datasets that need to be updated either hourly or daily 
 (e.g. today's coupon, click revenue remaining, etc.) . These advertiser feeds 
 reference the same keyspace that I use in the main index, but are otherwise 
 significantly lighter weight. Importing and indexing them discretely only 
 takes a couple minutes. Given that Solr/Lucene doesn't support field 
 updating, without having to drop and re-add an entire document, it doesn't 
 seem practical to integrate this data into the main index (the system would 
 be under a constant state of churn, if we did document re-inserts, and the 
 performance impact would probably be debilitating). It may be nice if this 
 data could participate in filtering (e.g. only show advertisers), but it 
 doesn't need to participate in scoring/ranking.
 
 I'm guessing that someone else has had a similar need, at some point?  I can 
 have our front-end query the smaller indices separately, using the keys 
 returned by the primary index, but would prefer to avoid the extra sequential 
 roundtrips. I'm hoping to also avoid a coding solution, if only to avoid the 
 maintenance overhead as we drop in new builds of Solr, but that's also 
 feasible.
 
 Thank you for your insight,
 Aaron
 



DataImportHandler

2010-02-08 Thread Sean Timm
It looks like the dataimporter.functions.escapeSql(String) function 
escapes quotes, but fails to escape '\' characters which are problematic 
especially when the field value ends in a \.  Also, on failure, I get an 
alarming notice of a possible resource leak.  I couldn't find Jira 
issues for either.


-Sean

(field names and data below have been sanitized)

config query line:
query=SELECT SUM(fielda) AS A, SUM(fieldb) AS B FROM tablea where 
fieldc='${dataimporter.functions.escapeSql(outer_entity.fieldc)}'


SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
execute query: SELECT SUM(fielda) AS A, SUM(fieldb) AS B FROM tablea 
where fieldc='somedata\' Processing Document # 1587
   at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
   at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
   at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
   at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
   at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
   at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
   at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
   at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
   at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
   at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
   at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
   at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Caused by: com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: You have 
an error in your SQL syntax; check the manual that corresponds to your 
MySQL server version for the right syntax to use near ''somedata\'' at 
line 1

   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:936)
   at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2985)
   at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1631)
   at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1723)
   at com.mysql.jdbc.Connection.execSQL(Connection.java:3277)
   at com.mysql.jdbc.Connection.execSQL(Connection.java:3206)
   at com.mysql.jdbc.Statement.execute(Statement.java:727)
   at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:246)

   ... 12 more
Feb 8, 2010 3:22:51 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Feb 8, 2010 3:22:51 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback
Feb 8, 2010 3:22:53 PM org.apache.solr.update.SolrIndexWriter finalize
SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a 
bug -- POSSIBLE RESOURCE LEAK!!!





Re: Trouble parsing XML from replication?command=status

2010-02-08 Thread Jason Rutherglen
javabin parses fine, which leads me to believe there's a bug
lurking... Though I'm not going spend time solving it.

On Mon, Feb 8, 2010 at 12:18 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 Via Firefox on Ubuntu I downloaded the results of
 replication?command=status to a file, then wrote a little app to parse
 out the XML.  Unfortunately it's not parsing.  I'm wondering if it's
 because it's in XML, which nothing in Solr parses (SnapPuller for
 example is using javabin).

 Caused by: javax.xml.stream.XMLStreamException: ParseError at 
 [row,col]:[3,1908]
 Message: error reading value:LST
        at 
 org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:319)
        at 
 org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:240)
        at 
 org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:239)
        at 
 org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125)
        ... 3 more



TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-08 Thread Burton-West, Tom
Hello all,

After optimizing rather large indexes on 10 shards (each index holds about 
500,000 documents and is  about 270-300 GB in size) we started getting  
intermittent TermInfosReader.get()  ArrayIndexOutOfBounds exceptions.  The 
exceptions sometimes seem to occur on all 10 shards at the same time and 
sometimes on one shard but not the others.   We also sometimes get an Internal 
Server Error but that might be either a cause or an effect of the array index 
out of bounds.  Here is the top part of the message:


java.lang.ArrayIndexOutOfBoundsException: -14127432
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:246)

Any suggestions for troubleshooting would be appreciated.

Trace from tomcat logs appended below.

Tom Burton-West

---

Feb 5, 2010 8:09:02 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: -14127432
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:246)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:218)
at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:943)
at 
org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:308)
at 
org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:144)
at org.apache.lucene.search.Similarity.idf(Similarity.java:481)
at 
org.apache.lucene.search.TermQuery$TermWeight.init(TermQuery.java:44)
at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:146)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:186)
at 
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:366)
at org.apache.lucene.search.Query.weight(Query.java:95)
at org.apache.lucene.search.Searcher.createWeight(Searcher.java:230)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at 
org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:651)
at 
org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:545)
at 
org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:581)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:903)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:176)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Thread.java:619)

Feb 5, 2010 8:09:02 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

request: http://solr-sdr-search-10:8081/serve-10/select
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:423)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
at 

Re: Embedded Solr problem

2010-02-08 Thread Sven Maurmann

Hi Ranveer,

I assume that you have enough knowlesge in Java. You should essentially
your code for instantiating the server (depending on what you intend to
do this may be done in a separate class or in a method of the class doing
the queries). Then you use this instance to handle all the queries using
for example the method query of SolrServer.

For further information you may want to consult either the API documentation
or the url http://wiki.apache.org/solr/Solrj from the wiki.

Cheers,
   Sven

--On Montag, 8. Februar 2010 08:53 +0530 Ranveer Kumar 
ranveer.s...@gmail.com wrote:



Hi Sven,
thanks for reply.

yes i notice that every time when request, new instance is created of solr
server.
could you please guide me to do the same ( initialization to create an
instance of SolrServer, once during first request).


On Mon, Feb 8, 2010 at 2:11 AM, Sven Maurmann
sven.maurm...@kippdata.dewrote:


Hi,

would it be possible that you instantiate a new instance of your
SolrServer every time you do a query?

You should use the code you quoted in your mail once during
initialization to create an instance of SolrServer (the interface being
implemented by EmbeddedSolrServer) and subsquently use the query method
of SolrServer to do the query.

Cheers,
   Sven


--On Sonntag, 7. Februar 2010 21:54 +0530 Ranveer Kumar 
ranveer.s...@gmail.com wrote:

 Hi All,


I am still very new to solr.
Currently I am facing problem to use EmbeddedSolrServer.
following is my code:

   File home = new
File(D:/ranveer/java/solr_home/solr/first);
CoreContainer coreContainer = new CoreContainer();
SolrConfig config = null;
config = new SolrConfig(home + /core1,solrconfig.xml,null);
CoreDescriptor descriptor = new
CoreDescriptor(coreContainer,core1,home + /core1);
SolrCore core = new SolrCore(core1, home+/core1/data, config, new
IndexSchema(config, schema.xml,null), descriptor);
coreContainer.register(core.getName(), core, true);
final EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer,
core1);

Now my problem is every time when I making request for search SolrCore
is initializing the core.
I want if the core/instance of core is already start then just use
previously started core.
Due to this problem right now searching is taking too much time.
I tried to close core after search but same thing when fresh search
result is made, solr is starting from very basic.

please help..
thanks





Re: Indexing / querying multiple data types

2010-02-08 Thread Sven Maurmann

Hi,

could you be a little more precise about your configuration?
It may be much easier to answer your question then.

Cheers,
Sven

--On Montag, 8. Februar 2010 17:39 + stefan.ma...@bt.com wrote:


OK - so I've now got my data-config.xml sorted so that I'm pulling in the
expected number of indexed documents for my two data sets

So I've defined two entities (name1  name2) and they both make use of
the same fields  --  I'm not sure if this is a good thing to have done

When I run a query I include qt=name1 (or qt=name2) and am expecting to
only get the number of results from the appropriate data set --  in fact
I'm getting the sum total from both

Does the entity name=name1 equate to the query qt=name1

In my solrconfig.xml I have defined two requestHandlers (name1  name2)
using the common set of fields

So how do ensure that my query
http://localhost:7001/solr/select/?q=foodqt=name1
or
http://localhost:7001/solr/select/?q=foodqt=name2

Will operate on the correct data set as loaded via the data import  --
entity name=name1 or entity name=name2




Thankss
Stefan Maric
BT Innovate  Design | Collaboration Platform - Customer Innovation
Solutions


DataImportHandler can't understand query

2010-02-08 Thread javaxmlsoapdev

I have a complex query (runs fine in database), which I am trying to include
in DataImportHandler query. Query has case statements with   in it 

e.g.

case when (ASSIGNED_TO   '' and TRANSLATE(ASSIGNED_TO, '',
'0123456789')='')

DataImportHandler failes to understand query with following error and
complaining about  symbol. How to go about this? Note; query is valid and
runs fine in database.

[Fatal Error] :26:26: The value of attribute query associated with an
element type entity must not contain the '' character.
Feb 8, 2010 6:02:09 PM org.apache.solr.handler.dataimport.DataImportHandler
inform
SEVERE: Exception while loading DataImporter
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
occurred while initializing context
at
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:190)

Thansk,
-- 
View this message in context: 
http://old.nabble.com/DataImportHandler-can%27t-understand-query-tp27507918p27507918.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler can't understand query

2010-02-08 Thread javaxmlsoapdev

Note I already tried to escape  character with \ but still it throws same
error.

Any idea?

Thanks,

javaxmlsoapdev wrote:
 
 I have a complex query (runs fine in database), which I am trying to
 include in DataImportHandler query. Query has case statements with   in
 it 
 
 e.g.
 
 case when (ASSIGNED_TO   '' and TRANSLATE(ASSIGNED_TO, '',
 '0123456789')='')
 
 DataImportHandler failes to understand query with following error and
 complaining about  symbol. How to go about this? Note; query is valid
 and runs fine in database.
 
 [Fatal Error] :26:26: The value of attribute query associated with an
 element type entity must not contain the '' character.
 Feb 8, 2010 6:02:09 PM
 org.apache.solr.handler.dataimport.DataImportHandler inform
 SEVERE: Exception while loading DataImporter
 org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
 occurred while initializing context
   at
 org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:190)
 
 Thansk,
 

-- 
View this message in context: 
http://old.nabble.com/DataImportHandler-can%27t-understand-query-tp27507918p27508214.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: DataImportHandler can't understand query

2010-02-08 Thread Shah, Nirmal
Did you try lt; gt;

Nirmal Shah
Remedy Consultant|Column Technologies|Cell: (630) 244-1648

-Original Message-
From: javaxmlsoapdev [mailto:vika...@yahoo.com] 
Sent: Monday, February 08, 2010 5:42 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler can't understand query


Note I already tried to escape  character with \ but still it throws
same
error.

Any idea?

Thanks,

javaxmlsoapdev wrote:
 
 I have a complex query (runs fine in database), which I am trying to
 include in DataImportHandler query. Query has case statements with  
in
 it 
 
 e.g.
 
 case when (ASSIGNED_TO   '' and TRANSLATE(ASSIGNED_TO, '',
 '0123456789')='')
 
 DataImportHandler failes to understand query with following error and
 complaining about  symbol. How to go about this? Note; query is
valid
 and runs fine in database.
 
 [Fatal Error] :26:26: The value of attribute query associated with
an
 element type entity must not contain the '' character.
 Feb 8, 2010 6:02:09 PM
 org.apache.solr.handler.dataimport.DataImportHandler inform
 SEVERE: Exception while loading DataImporter
 org.apache.solr.handler.dataimport.DataImportHandlerException:
Exception
 occurred while initializing context
   at

org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImpor
ter.java:190)
 
 Thansk,
 

-- 
View this message in context:
http://old.nabble.com/DataImportHandler-can%27t-understand-query-tp27507
918p27508214.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr multicore and nfs

2010-02-08 Thread Lance Norskog
Solr generally does not work well over NFS. This looks like a
transient NFS error; apps have to assume that NFS will randomly fail
and that they have to try again.

This may be due to a locking problem. There is a LockFactory class in
Lucene that controls how indexes are shared between programs. Solr
includes a control for this in solrconfig.xml. The java apps and Solr
apps have to agree on the lock strategy. And it has to work over NFS.

http://wiki.apache.org/lucene-java/AvailableLockFactories

solrconfig.xml:


!--
  As long as Solr is the only process modifying your index, it is
  safe to use Lucene's in process locking mechanism.  But you may
  specify one of the other Lucene LockFactory implementations in
  the event that you have a custom situation.

  none = NoLockFactory (typically only used with read only indexes)
  single = SingleInstanceLockFactory (suggested)
  native = NativeFSLockFactory
  simple = SimpleFSLockFactory

  ('simple' is the default for backwards compatibility with Solr 1.2)
--
lockTypesingle/lockType

On Thu, Feb 4, 2010 at 6:54 AM, Valérie TAESCH v.tae...@greenivory.com wrote:
 Hello,

 We are using Solr(v 1.3.0 694707 with Lucene version 2.4-dev 691741) in
 multicore mode with an average of 400 indexes (all indexes have the same
 structure).
 These indexes are stored on a nfs disk.
 A java process writes continuously in these indexes while solr is only used
 to read those indexes.

 We often got this exception :
 HTTP Status 500 - No such file or directory java.io.IOException: No such
 file or directory at java.io.RandomAccessFile.readBytes(Native Method) at
 java.io.RandomAccessFile.read(RandomAccessFile.java:322) at
 org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596)
 at
 org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136)
 at
 org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:247)
 at
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
 at
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
 at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78) at
 org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64) at
 org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:127) at
 org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:158) at
 org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:270) at
 org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:217) at
 org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:744) at
 org.apache.lucene.index.MultiSegmentReader.docFreq(MultiSegmentReader.java:375)
 at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:87) at
 org.apache.lucene.search.Similarity.idf(Similarity.java:457) at
 org.apache.lucene.search.TermQuery$TermWeight.init(TermQuery.java:44) at
 org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:146) at
 org.apache.lucene.search.Query.weight(Query.java:95) at
 org.apache.lucene.search.Searcher.createWeight(Searcher.java:185) at
 org.apache.lucene.search.Searcher.search(Searcher.java:126) at
 org.apache.lucene.search.Searcher.search(Searcher.java:105) at
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966)
 at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
 at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)
 at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at 

Re: Is it posible to exclude results from other languages?

2010-02-08 Thread Lance Norskog
There is

On Thu, Feb 4, 2010 at 10:07 AM, Raimon Bosch raimon.bo...@gmail.com wrote:


 Yes, It's true that we could do it in index time if we had a way to know. I
 was thinking in some solution in search time, maybe measuring the % of
 stopwords of each document. Normally, a document of another language won't
 have any stopword of its main language.

 If you know some external software to detect the language of a source text,
 it would be useful too.

 Thanks,
 Raimon Bosch.



 Ahmet Arslan wrote:


 In our indexes, sometimes we have some documents written in
 other languages
 different to the most common index's language. Is there any
 way to give less
 boosting to this documents?

 If you are aware of those documents, at index time you can boost those
 documents with a value less than 1.0:

 add
   doc boost=0.5
     // document written in other languages
     field name=../field
     field name=../field
   /doc
 /add

 http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_on_.22doc.22






 --
 View this message in context: 
 http://old.nabble.com/Is-it-posible-to-exclude-results-from-other-languages--tp27455759p27457165.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
Lance Norskog
goks...@gmail.com


Re: source tree for lucene

2010-02-08 Thread Lance Norskog
The Solr trunk/lib directory contains lucene libs as of January 18.
(They were checked in on that date.)

On Thu, Feb 4, 2010 at 4:28 PM, Joe Calderon calderon@gmail.com wrote:
 i want to recompile lucene with
 http://issues.apache.org/jira/browse/LUCENE-2230, but im not sure
 which source tree to use, i tried using the implied trunk revision
 from the admin/system page but solr fails to build with the generated
 jars, even if i exclude the patches from 2230...

 im wondering if there is another lucene tree i should grab to use to build 
 solr?


 --joe




-- 
Lance Norskog
goks...@gmail.com


Re: Fundamental questions of how to build up solr for huge portals

2010-02-08 Thread Lance Norskog
In general, search is disk/intensive. Lots of RAM (up to 32G at
today's prices) and fast hard disks matter. For administration, the
single biggest disruptor is updating the index. If you can keep index
updates to off-peak hours it will be ok. If not, index on one server
and server queries from another. Use local hard disks.

http://wiki.apache.org/solr/SolrPerformanceData

http://wiki.apache.org/solr/SolrPerformanceFactors

On Fri, Feb 5, 2010 at 5:19 AM, Fuad Efendi f...@efendi.ca wrote:
 - whats the best way to use solr to get the best performance for an huge
 portal with 5000 users that might expense fastly?

 5000 users:
 200 TPS, for instance, equal to 1200 concurrent users (each user makes 1
 request per minute); so that single SOLR instance is more than enough.

 Why 200TPS? It is bottom line, for fuzzy search (I recently improved it).

 In real life, real hardware, 1000TPS (using caching, not frequently using
 fuzzy search, etc.) which is equal to 6 concurrent users, subsequently
 to more than 600,000 of total users.

 The rest depends on your design...

 If you have separate portals A, B, C - create a field with values A, B, C.

 Liferay Portal nicely integrates with SOLR... each kind of Portlet object
 (Forum Post, Document, Journal Article, etc.) can implement searchable and
 be automatically indexed. But Liferay is Java-based, JSR-168, JSR-286 (and
 it supports PHP-portlets, but I never tried).

 Fuad Efendi
 +1 416-993-2060
 http://www.linkedin.com/in/liferay


 -Original Message-
 From: Peter [mailto:zarato...@gmx.net]
 Sent: January-16-10 10:17 AM
 To: solr-user@lucene.apache.org
 Subject: Fundamental questions of how to build up solr for huge portals

 Hello!

 Our team wants to use solr for an community portal built up out of 3 and
 more sub portals. We are unsure in which way we sould build up the whole
 architecture, because we have more than one portal and we want to make
 them all connected and searchable by solr. Could some experts help us on
 these questions?

 - whats the best way to use solr to get the best performance for an huge
 portal with 5000 users that might expense fastly?
 - which client to use (Java,PHP...)? Now the portal is almost PHP/MySQL
 based. But we want to make solr as best as it could be in all ways
 (performace, accesibility, way of good programming, using the whole
 features of lucene - like tagging, facetting and so on...)


 We are thankful of every suggestions :)

 Thanks,
 Peter






-- 
Lance Norskog
goks...@gmail.com


Re: Slow QueryComponent.process() when queries have numbers in them

2010-02-08 Thread Lance Norskog
The single-digit numbers are probably in all of docs. You might want
to rip them out with a SynonymFilter. The more docs that a query
finds, the longer the query takes.

On Fri, Feb 5, 2010 at 1:23 PM, Simon Wistow si...@thegestalt.org wrote:
 On Wed, Feb 03, 2010 at 07:38:13PM -0800, Lance Norskog said:
 The debugQuery parameter shows you how the query is parsed into a tree
 of Lucene query objects.

 Well, that's kind of what I'm asking - I know how the query is being
 parsed:

 str name=rawquerystringmyers 8e psychology chapter 9/str

 str name=querystringmyers 8e psychology chapter 9/str

 str name=parsedquery
 +((DisjunctionMaxQuery((content:myer^0.8 | title:myer^1.5)~0.01)
 DisjunctionMaxQuery((content:8 e~2^0.8 | title:8 e~2^1.5)~0.01)
 DisjunctionMaxQuery((content:psycholog^0.8 | title:psycholog^1.5)~0.01)
 DisjunctionMaxQuery((content:chapter^0.8 | title:chapter^1.5)~0.01)
 DisjunctionMaxQuery((content:9^0.8 | title:9^1.5)~0.01))~4) ()
 /str

 str name=parsedquery_toString
 +(((content:myer^0.8 | title:myer^1.5)~0.01 (content:8 e~2^0.8 |
 title:8 e~2^1.5)~0.01 (content:psycholog^0.8 |
 title:psycholog^1.5)~0.01 (content:chapter^0.8 | title:chapter^1.5)~0.01
 (content:9^0.8 | title:9^1.5)~0.01)~4) ()
 /str

 But that's sort of besides the point - I was really asking if this is a
 known issue (i.e queries with numbers in them can be very slow) and
 whether there are any workarounds









-- 
Lance Norskog
goks...@gmail.com


Re: unloading a solr core doesn't free any memory

2010-02-08 Thread Lance Norskog
The 'jconsole' program lets you monitor GC operation in real-time.

http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html

On Mon, Feb 8, 2010 at 8:44 AM, Simon Rosenthal
simon_rosent...@yahoo.com wrote:
 What Garbage Collection parameters is the JVM using ?   the memory will not
 always be freed immediately after an event like unloading a core or starting
 a new searcher.

 2010/2/8 Tim Terlegård tim.terleg...@gmail.com

 To me it doesn't look like unloading a Solr Core frees the memory that
 the core has used. Is this how it should be?

 I have a big index with 50 million documents. After loading a core it
 takes 300 MB RAM. After a query with a couple of sort fields Solr
 takes about 8 GB RAM. Then I unload (CoreAdminRequest.unloadCore) the
 core. The core is not shown in /solr/ anymore. Solr still takes 8 GB
 RAM. Creating new cores is super slow because I have hardly any memory
 left. Do I need to free the memory explicitly somehow?

 /Tim





-- 
Lance Norskog
goks...@gmail.com


Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-08 Thread Lance Norskog
The index is corrupted. In some places ArrayIndex and NPE are not
wrapped as CorruptIndexException.

Try running your code with the Lucene assertions on. Add this to the
JVM arguments:  -ea:org.apache.lucene...

On Mon, Feb 8, 2010 at 1:02 PM, Burton-West, Tom tburt...@umich.edu wrote:
 Hello all,

 After optimizing rather large indexes on 10 shards (each index holds about 
 500,000 documents and is  about 270-300 GB in size) we started getting  
 intermittent TermInfosReader.get()  ArrayIndexOutOfBounds exceptions.  The 
 exceptions sometimes seem to occur on all 10 shards at the same time and 
 sometimes on one shard but not the others.   We also sometimes get an 
 Internal Server Error but that might be either a cause or an effect of the 
 array index out of bounds.  Here is the top part of the message:


 java.lang.ArrayIndexOutOfBoundsException: -14127432
        at 
 org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:246)

 Any suggestions for troubleshooting would be appreciated.

 Trace from tomcat logs appended below.

 Tom Burton-West

 ---

 Feb 5, 2010 8:09:02 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.ArrayIndexOutOfBoundsException: -14127432
        at 
 org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:246)
        at 
 org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:218)
        at 
 org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:943)
        at 
 org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:308)
        at 
 org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:144)
        at org.apache.lucene.search.Similarity.idf(Similarity.java:481)
        at 
 org.apache.lucene.search.TermQuery$TermWeight.init(TermQuery.java:44)
        at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:146)
        at 
 org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:186)
        at 
 org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:366)
        at org.apache.lucene.search.Query.weight(Query.java:95)
        at org.apache.lucene.search.Searcher.createWeight(Searcher.java:230)
        at org.apache.lucene.search.Searcher.search(Searcher.java:171)
        at 
 org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:651)
        at 
 org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:545)
        at 
 org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:581)
        at 
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:903)
        at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
        at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
        at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:176)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
        at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
        at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
        at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
        at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
        at 
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
        at 
 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
        at 
 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
        at 
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
        at java.lang.Thread.run(Thread.java:619)

 Feb 5, 2010 8:09:02 AM org.apache.solr.common.SolrException log
 SEVERE: 

Re: Dynamic fields with more than 100 fields inside

2010-02-08 Thread Shalin Shekhar Mangar
On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler 
xavier.schep...@sciences-po.fr wrote:

 Hey,

 I'm thinking about using dynamic fields.

 I need one or more user specific field in my schema, for example,
 concept_user_*, and I will have maybe more than 200 users using this
 feature.
 One user will send and retrieve values from its field. It will then be used
 to filter result.

 How would it impact query performance ?


Can you give an example of such a query?

-- 
Regards,
Shalin Shekhar Mangar.