Exception while processing: attach document

2010-10-29 Thread Bac Hoang

 Hello all,

I'm getting stuck when trying to import oracle DB to solr index, could 
any one of you give a hand. Thanks million.


Below is some short info. that might be a question

My Sorl: 1.4.1

 *LOG *
INFO: Starting Full Import
Oct 29, 2010 1:19:35 PM org.apache.solr.handler.dataimport.SolrWriter 
readIndexerProperties

INFO: Read dataimport.properties
Oct 29, 2010 1:19:35 PM 
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity attach with URL: 
jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22
Oct 29, 2010 1:19:36 PM org.apache.solr.handler.dataimport.DocBuilder 
buildDocument
*SEVERE: Exception while processing: attach document *: 
SolrInputDocument[{}]
org.apache.solr.handler.dataimport.DataImportHandlerException: *Unable 
to execute query: *select * from /A.B/ Processing Document # 1


where A: a schema
B: a table

 *dataSource *===
dataSource name=jdbc driver=oracle.jdbc.driver.OracleDriver
 url=jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22 user=abc 
password=xyz

 readOnly=true autoCommit=false batchSize=1/
document
entity dataSource=jdbc name=attach query=select * from /A.B/
entity processor=SqlEntityProcessor dataField=attach.TOPIC 
format=text

field column=text name=text /
/entity
/entity
/document

where TOPIC is a filed of table B

Thanks again



Re: Looking for Developers

2010-10-29 Thread 朱炎詹
When I first saw this particular email, I wrote a letter intend to ask the 
sender remove solr-user from its recepient cause I thought this should go to 
solr-dev. But then I thought again, it's about 'job-offer' not 'development 
of Solr', I just delete my email.


Maybe solr-job is a good suggestion. A selfish reason pro this suggestion is 
that I'm also looking for some one familiar with Solr to work for me in 
Taiwan  I really don't know where to ask.


Scott

- Original Message - 
From: Dennis Gearon gear...@sbcglobal.net

To: solr-user@lucene.apache.org; doh...@gmail.com
Sent: Friday, October 29, 2010 4:28 AM
Subject: Re: Looking for Developers


Hey! I represent those remarks! I was on that committee (really) because I 
am/was a:


  http://www.rhyolite.com/anti-spam/you-might-be.html#spam-fighter

 and about 20 other 'types' on that list. I'm a little bit more mature, but 
only a little. White lists are the only way to go.



Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
 otherwise we all die.


--- On Thu, 10/28/10, Ken Stanley doh...@gmail.com wrote:


From: Ken Stanley doh...@gmail.com
Subject: Re: Looking for Developers
To: solr-user@lucene.apache.org
Date: Thursday, October 28, 2010, 12:33 PM
On Thu, Oct 28, 2010 at 2:57 PM,
Michael McCandless 
luc...@mikemccandless.com
wrote:

 I don't think we should do this until it becomes a
real problem.

 The number of job offers is tiny compared to dev
emails, so far, as
 far as I can tell.

 Mike


By the time that it becomes a real problem, it would be too
late to get
people to stop spamming the -user mailing list; no?

- Ken









___b___J_T_f_r_C
Checked by AVG - www.avg.com
Version: 9.0.865 / Virus Database: 271.1.1/3223 - Release Date: 10/28/10 
03:12:00




Re: Looking for Developers

2010-10-29 Thread Gora Mohanty
On Fri, Oct 29, 2010 at 12:23 PM, scott chu (朱炎詹)
scott@udngroup.com wrote:
 When I first saw this particular email, I wrote a letter intend to ask the
 sender remove solr-user from its recepient cause I thought this should go to
 solr-dev. But then I thought again, it's about 'job-offer' not 'development
 of Solr', I just delete my email.

To add more with regards to the original mail that started this thread: We are
based in India, and for the first mail, I replied to the person
off-list offering our
services, but never got a reply. So, I wonder how serious this guy was in the
first place.

 Maybe solr-job is a good suggestion. A selfish reason pro this suggestion is
 that I'm also looking for some one familiar with Solr to work for me in
 Taiwan  I really don't know where to ask.

In other lists with a broader audience, such as a local Linux users list, our
practice has been that job offers are tolerated if posted once, and marked
as Commercial in the subject header. Given the low volume of such posts
in this list, maybe that could be an acceptable solution? We would also be
happy with a separate solr-jobs list.

Regards,
Gora


Maximum of length of a Dismax Query?

2010-10-29 Thread Swapnonil Mukherjee
Hi Everybody,

It seems that the maximum query length supported by the Dismax Query Handler is 
3534 characters. Is there anyway I can set this limit to be around 12,000?

If I fire a query beyond 3534 characters, I don't even get error messages in 
the catalina.XXX log files.

Swapnonil Mukherjee
+91-40092712
+91-9007131999





Re: QueryElevation Component is so slow

2010-10-29 Thread Chamnap Chhorn
anyone has some suggestions to improve the search?
thanks

On 10/28/10, Chamnap Chhorn chamnapchh...@gmail.com wrote:
 Sorry for very bad pasting. I paste it again.

 Slowest Components  Count   Exclusive
  Total
 QueryElevationComponent 1 506,858 ms
 100%
 506,858 ms 100%
 SolrIndexSearcher 1 2.0 ms
  0% 2.0 ms 0%
 org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms
  0% 506,862 ms 100%
 QueryComponent 1 1.0 ms
  0%   1.0 ms 0%
 DebugComponent 1 0.0 ms
  0% 0.0 ms 0%
 FacetComponent 1 0.0 ms
  0% 0.0 ms 0%

 On Thu, Oct 28, 2010 at 4:57 PM, Chamnap Chhorn
 chamnapchh...@gmail.comwrote:

 Hi,

 I'm using solr 1.4 and using QueryElevation Component for guaranteed
 search
 position. I have around 700,000 documents with 1 Mb elevation file. It
 turns
 out it is quite slow on the newrelic monitoring website:

 Slowest Components Count Exclusive Total   QueryElevationComponent 1
 506,858 ms 100% 506,858 ms 100% SolrIndexSearcher 1 2.0 ms 0% 2.0 ms 0%
 org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms 0% 506,862
 ms
 100% QueryComponent 1 1.0 ms 0% 1.0 ms 0% DebugComponent 1 0.0 ms 0% 0.0
 ms
 0% FacetComponent 1 0.0 ms 0% 0.0 ms 0%

 As you could see, QueryElevationComponent takes quite a lot of time. Any
 suggestion how to improve this?

 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/




 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/



-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


Newbie to Solr, LIKE:foo

2010-10-29 Thread MilleBii
I'm Nutch user but I'm considering to use Solr for the following reason.

I need a LIKE:foo , which turns into a *foo* query. I saw the built-in
prefix query parser but it does only look for foo*, if I understand it well
So is there a query parser that does what I'm looking.
If not how difficult is it to build one with Solr ?

-- 
-MilleBii-


Re: Looking for Developers

2010-10-29 Thread Mark Allan
For me, I simply deleted the original email, but I'm now quite  
enjoying the irony of the complaints causing more noise on the list  
than the original email!  ;-)


M


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: Possible bug in query sorting

2010-10-29 Thread Pablo Recio
That's my schema XML:

?xml version=1.0 encoding=UTF-8 ?
schema name=example version=1.2
 types
   fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/
   fieldType name=uuid class=solr.UUIDField indexed=true
required=true omitNorms=true/
   fieldType name=date class=solr.TrieDateField omitNorms=true
precisionStep=0 positionIncrementGap=0/
   fieldType name=integer class=solr.IntField omitNorms=true/
   fieldType name=text class=solr.TextField positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.
WhitespaceTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory /
   filter class=solr.RemoveDuplicatesTokenFilterFactory /
   filter class=solr.ISOLatin1AccentFilterFactory /
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
   filter class=solr.LowerCaseFilterFactory /
   filter class=solr.RemoveDuplicatesTokenFilterFactory /
   filter class=solr.ISOLatin1AccentFilterFactory /
 /analyzer
   /fieldType
 /types

 fields
  field name=text type=text indexed=true stored=false
required=false multiValued=false omitNorms=false /
  field name=icms_collection type=text indexed=true stored=true
required=true multiValued=false omitNorms=false /
  field name=link type=text indexed=true stored=true
required=true multiValued=false omitNorms=false /
  field name=title type=text indexed=true stored=true
required=true multiValued=false omitNorms=false /
  field name=contributor type=text indexed=false stored=false
required=false multiValued=false omitNorms=false /
  
 /fields

 uniqueKeylink/uniqueKey
 defaultSearchFieldtext/defaultSearchField

 solrQueryParser defaultOperator=AND/

 copyField source=title dest=text/
 copyField source=contributor dest=text/
 ...

/schema


2010/10/28 Gora Mohanty g...@mimirtech.com

 On Thu, Oct 28, 2010 at 5:18 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
  Is it somehow possible that you are trying to sort by a multi-valued
 field?
 [...]

 Either that, or or your field gets processed into multiple tokens via the
 analyzer/tokenizer path in your schema. The reported error is a
 consequence of the fact that different documents might result in a
 different number of tokens.

 Please show us the part of schema.xml that defines the field type for
 the field title.

 Regards,
 Gora



Natural string sorting

2010-10-29 Thread RL

Just a quick question about natural sorting of strings.

I've a simple dynamic field in my schema:

fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/
field name=nameSort_en type=string indexed=true stored=false
omitNorms=true/

There are 3 indexed strings for example
string1,string2,string10

Executing a query and sorting by this field leads to unnatural sorting of :
string1
string10
string2

(Some time ago i used Lucene and i was pretty sure that Lucene used a
natural sort, thus i expected the same from solr)
Is there a way to sort in a natural order? Config option? Plugin? Expected
output would be:
string1
string2
string10


Thanks in advance.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Natural-string-sorting-tp1791227p1791227.html
Sent from the Solr - User mailing list archive at Nabble.com.


org.tartarus package in lucene/solr?

2010-10-29 Thread Tharindu Mathew
Hi,

How come $subject is present??

-- 
Regards,

Tharindu


Re: Natural string sorting

2010-10-29 Thread Savvas-Andreas Moysidis
I think string10 is before string2 in lexicographic order?

On 29 October 2010 09:18, RL rl.subscri...@gmail.com wrote:


 Just a quick question about natural sorting of strings.

 I've a simple dynamic field in my schema:

 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/
 field name=nameSort_en type=string indexed=true stored=false
 omitNorms=true/

 There are 3 indexed strings for example
 string1,string2,string10

 Executing a query and sorting by this field leads to unnatural sorting of :
 string1
 string10
 string2

 (Some time ago i used Lucene and i was pretty sure that Lucene used a
 natural sort, thus i expected the same from solr)
 Is there a way to sort in a natural order? Config option? Plugin? Expected
 output would be:
 string1
 string2
 string10


 Thanks in advance.


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Natural-string-sorting-tp1791227p1791227.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Possible bug in query sorting

2010-10-29 Thread Gora Mohanty
On Fri, Oct 29, 2010 at 1:47 PM, Pablo Recio pre...@yaco.es wrote:
 That's my schema XML:

   fieldType name=text class=solr.TextField positionIncrementGap=100
     analyzer type=index
       tokenizer class=solr.
 WhitespaceTokenizerFactory/
       filter class=solr.LowerCaseFilterFactory /
       filter class=solr.RemoveDuplicatesTokenFilterFactory /
       filter class=solr.ISOLatin1AccentFilterFactory /
     /analyzer
     analyzer type=query
       tokenizer class=solr.WhitespaceTokenizerFactory/
       filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
       filter class=solr.LowerCaseFilterFactory /
       filter class=solr.RemoveDuplicatesTokenFilterFactory /
       filter class=solr.ISOLatin1AccentFilterFactory /
     /analyzer
   /fieldType
  /types

  fields
[...]
  field name=title type=text indexed=true stored=true
 required=true multiValued=false omitNorms=false /
  field name=contributor type=text indexed=false stored=false
 required=false multiValued=false omitNorms=false /
  
  /fields
[...]

The issue is that you are using the WhitespaceTokenizerFactory
as an analyzer for the field. This is resulting in a different number
of tokens in different documents, which is causing the error.

Use a field that is non-tokenized, e.g., change the type of the
title field to string. If you need a tokenized title field, copy
the field to another of type string, and sort on that field instead.
Please see http://wiki.apache.org/solr/CommonQueryParameters#sort

Regards,
Gora


Re: Searching for terms on specific fields

2010-10-29 Thread Imran
Cheers Hoss. That did it for me.

~~Sent by an Android
On 29 Oct 2010 00:39, Chris Hostetter hossman_luc...@fucit.org wrote:

 The specifics of your overall goal confuse me a bit, but drilling down to
 your core question...

 : I want to be able to use the dismax parser to search on both terms
 : (assigning slops and tie breaks). I take it the 'fq' is a candidate for
 : this,but can I add dismax capabilities to fq as well? Also my query
would be

 ...you can use any parser you want for fq, using the localparams syntax...

 http://wiki.apache.org/solr/LocalParams

 ..so you could have something like...

 q=foo:barfq={!dismax qf='yak zak'}baz

 ..the one thing you have to watch out for when using localparams and
 dismax is that the outer params are inherited by the inner params by
 default -- so if you are using dismax for your main query 'q' (with
 defType) and you have global params for qf, pf, bq, etc... those are
 inherited by your fq={!dismax} query unless you override them with local
 params


 -Hoss


Re: OutOfMemory and auto-commit

2010-10-29 Thread Tommaso Teofili
If the problem is autowarming queries running in the meantime maybe you
could consider changing set to true the following:

useColdSearcherfalse/useColdSearcher

and/or change this value
maxWarmingSearchers2/maxWarmingSearchers

another option would be lowering the value of autowarmCount inside the cache
definitions.

Hope this helps.
Tommaso

2010/10/25 Jonathan Rochkind rochk...@jhu.edu

 Yes, that's my question too.  Anyone?

 Dennis Gearon wrote:

 How is this avoided?

 Dennis Gearon




 --- On Thu, 10/21/10, Lance Norskog goks...@gmail.com wrote:



 From: Lance Norskog goks...@gmail.com
 Subject: Re: OutOfMemory and auto-commit
 To: solr-user@lucene.apache.org
 Date: Thursday, October 21, 2010, 9:53 PM
 Yes. Indexing activity suspends until
 the commit finishes, then
 starts. Having both queries and indexing on the same Solr
 will have
 this memory problem.

 Lance

 On Thu, Oct 21, 2010 at 1:16 PM, Jonathan Rochkind rochk...@jhu.edu
 wrote:


 If I do _not_ have any auto-commit enabled, and add


 500k documents and


 commit at end, no problem.

 If I instead set auto-commit maxDocs to 10 (pretty


 large number), and


 try to add 500k docs, with autocommits theoretically


 happening every 100k...


 I run into an OutOfMemory error.

 Can anyone think of any reasons that would cause this,


 and how to resolve


 it?
 All I can think of is that in the first case, my


 newSearcher and


 firstSearcher warming queries don't run until the


 'document add' is


 completely done. In the second case, there are


 newSearcher and firstSearcher


 warming queries happening at the same time another


 process is continuing to


 stream 'add's to Solr.   Although at a maxDocs of


 10, I shouldn't (I


 think) get _overlapping_ warming queries, the warming


 queries should be done


 before the next commit. I think. But nonetheless, just


 the fact that warming


 queries are happening at the same time 'add's are


 continuing to stream,


 could that be enough to somehow increase memory usage


 enough to run into


 OOM?




 --
 Lance Norskog
 goks...@gmail.com






Re: Maximum of length of a Dismax Query?

2010-10-29 Thread Swapnonil Mukherjee
I am using the SOLRJ client to post my query, The query length is roughly 
10,000 characters. I am using GET like this.

int page = 1;
int resultsPerPage = 24;
ModifiableSolrParams params = new ModifiableSolrParams();
params.set(q, query);
params.set(start,  + (page - 1) * resultsPerPage);
params.set(rows, resultsPerPage);
try
{
QueryResponse response = 
QueryServerManager.getSolrServer().query(params, SolrRequest.METHOD.GET);
assertNotNull(response);
}
catch (SolrServerException e)
{
e.printStackTrace();
}
This hits the exception block with the following exception

org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: 
Connection reset
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:122)
at 
com.getty.search.tests.DismaxQueryTestCase.testAssetQuery(DismaxQueryTestCase.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at junit.framework.TestCase.runTest(TestCase.java:154)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.textui.TestRunner.doRun(TestRunner.java:116)
at 
com.intellij.junit3.JUnit3IdeaTestRunner.doRun(JUnit3IdeaTestRunner.java:108)
at junit.textui.TestRunner.doRun(TestRunner.java:109)
at 
com.intellij.junit3.JUnit3IdeaTestRunner.startRunnerWithArgs(JUnit3IdeaTestRunner.java:42)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:192)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64)

Swapnonil Mukherjee



On 29-Oct-2010, at 12:44 PM, Swapnonil Mukherjee wrote:

 Hi Everybody,
 
 It seems that the maximum query length supported by the Dismax Query Handler 
 is 3534 characters. Is there anyway I can set this limit to be around 12,000?
 
 If I fire a query beyond 3534 characters, I don't even get error messages in 
 the catalina.XXX log files.
 
 Swapnonil Mukherjee
 +91-40092712
 +91-9007131999
 
 
 



Re: Natural string sorting

2010-10-29 Thread Toke Eskildsen
On Fri, 2010-10-29 at 10:18 +0200, RL wrote:
 Executing a query and sorting by this field leads to unnatural sorting of :
 string1
 string10
 string2

That's very much natural. Numbers are not treated any different from
words made up of letters. Your have to use alignment if you want to use
natural sorting:
string01
string02
string10

 (Some time ago i used Lucene and i was pretty sure that Lucene used a
 natural sort, thus i expected the same from solr)

Lucene sorts the same way, if you just use standard sort.

 Is there a way to sort in a natural order? Config option? Plugin? Expected
 output would be:
 string1
 string2
 string10

I don't know how to do this in Solr, sorry. To do it in Lucene without
changing the terms, you could use a custom comparator that tokenizes the
Strings in numbers vs. everything else and do the compare
token-by-token, alternating between natural sort and numeric sort
depending on the token type.



Re: Overriding Tika's field processing

2010-10-29 Thread Lance Norskog
If you change 'title' to be single-valued, the Extracting thing may or
may not override it. I remember a go-round on this problem. But the
ExtractingWhatsIt has code that explicitly checks for single-valued
v.s. multi-valued.

And this may all be different in different Solr versions. The
DataImportHandler has Tika support in 3.x and trunk, and the DIH gives
a lot more control about what field has what value.

On Thu, Oct 28, 2010 at 8:53 AM, Tod listac...@gmail.com wrote:
 I'm reading my document data from a CMS and indexing it using calls to curl.
  The curl call includes 'stream.url' so Tika will also index the actual
 document pointed to by the CMS' stored url.  This works fine.

 Presentation side I have a dropdown with the title of all the indexed
 documents such that when a user clicks one of them it opens in a new window.
  Using js, I've been parsing the json returned from Solr to create the
 dropdown.  The problem is I can't get the titles sorted alphabetically.

 If I use a facet.sort on the title field I get back ALL the sorted titles in
 the facet block, but that doesn't include the associated URL's.  A sorted
 query won't work because title is a multivalued field.

 The one option I can think of is to make the title single valued so that I
 have a one to one relationship to the returned url.  To do that I'd need to
 be able to *not* index the Tika returned values.

 If I read right, my understanding was that I could use 'literal.title' in
 the curl call to limit what would be included in the index from Tika.  That
 doesn't seem to be working as a test facet query returns more than I have in
 the CMS.

 Am I understanding the 'literal.title' processing correctly?  Does anybody
 have experience/suggestions on how to handle this?


 Thanks - Tod





-- 
Lance Norskog
goks...@gmail.com


Re: RAM increase

2010-10-29 Thread satya swaroop
Hi All,

 Thanks for your reply.I have a doubt whether to increase the ram or
heap size to java or to tomcat where the solr is running


Regards,
satya


Re: Looking for Developers

2010-10-29 Thread Toke Eskildsen
On Fri, 2010-10-29 at 10:06 +0200, Mark Allan wrote:
 For me, I simply deleted the original email, but I'm now quite  
 enjoying the irony of the complaints causing more noise on the list  
 than the original email!  ;-)

He he. An old classic. Next in line is the meta-meta-discussion about
whether meta-discussions belong on the list or if they should be moved
to solr-user-meta. Repeat ad nauseam.

Job-postings are on-topic IMHO and unless their volume grows
significantly, I see no reason to create a new mail lists.



Re: Upgrading from Solr 1.2 to 1.4.1

2010-10-29 Thread Lance Norskog
Yes, from Solr 1.2 to 1.3/Lucene 2.4.1 to 2.9 there was a change in
the Porter stemmer for English. I don't know what it was. It may also
affect the other language variants of the stemmer.

If stemming is important for your users, you might want to try the
Solr 3.x branch instead, or find Lucid's KStem implementation for
1.4.1. 3.x has a lot of work on better stemmers for many languages.

On Thu, Oct 28, 2010 at 2:23 PM, Robert Muir rcm...@gmail.com wrote:
 On Thu, Oct 28, 2010 at 4:44 PM,  johnmu...@aol.com wrote:

 I'm using Solr 1.2.  If I upgrade to 1.4.1, must I re-index because of 
 LUCENE-1142?  If so, how will this affect me if I don’t re-index (I'm using 
 EnglishPorterFilterFactory)?  What about when I’m using non-English stammers 
 from Snowball?

 Beside the brief note IMPORTANT UPGRADE NOTE about this in CHANGES.txt, 
 where can I read more about this?  I looked in JIRA, LUCENE-1142, there 
 isn't much.

 I haven't looked in detail regarding these changes, but the snowball
 was upgraded to revision 500 here.
 you can see the revisions/logs of the various algorithms here:
 http://svn.tartarus.org/snowball/trunk/snowball/algorithms/?pathrev=500

 One problem being, i don't know the previous revision you were
 using...but since it had no Hungarian before LUCENE-1142, it couldnt
 have possibly been any *later* than revision 385:

    Revision 385 - Directory Listing
    Added Mon Sep 4 14:06:56 2006 UTC (4 years, 1 month ago) by martin
    New Hungarian stemmer

 This means, for example, that you would certainly be affected by
 changes in the english stemmer such as revision 414, among others:

    Revision 414 - Directory Listing
    Modified Mon Nov 20 10:49:29 2006 UTC (3 years, 11 months ago) by martin
    'arsen' as exceptional p1 position, to prevent 'arsenic' and
 'arsenal' conflating

 In my opinion, it would be best to re-index.




-- 
Lance Norskog
goks...@gmail.com


Re: No response from Solr on complex request after several days

2010-10-29 Thread Lance Norskog
There are a few problems that can happen. This is usually a sign of
garbage collection problems.
You can monitor the Tomcat instance with JConsole or one of the other
java monitoring tools and see if there is a memory leak.

Also, most people don't need to do it, but you can automatically
restart it once a day.

On Thu, Oct 28, 2010 at 2:20 AM, Xavier Schepler
xavier.schep...@sciences-po.fr wrote:
 Hi,

 We are in a beta testing phase, with several users a day.

 After several days of waiting, the solr server didn't respond to requests
 that require a lot of processing time.

 I'm using Solr inside Tomcat.

 This is the request that had no response from the server :

 wt=jsonomitHeader=trueq=qiAndMSwFR%3A%28transport%29q.op=ANDstart=0rows=5fl=id,domainId,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,sort=score%20descfq=solrLangCode%3AFRfacet=truefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DdomainIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyDecadefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudySerieIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyYearAndDescriptionIdfacet.sort=countf.studyDecade.facet.sort=lexspellcheck=truespellcheck.count=10spellcheck.dictionary=qiAndMFRspellcheck.q=transporthl=onhl.fl=qSwFR,iHLSwFR,mHLSwFRhl.fragsize=0hl.snippets=1hl.usePhraseHighlighter=truehl.highlightMultiTerm=truehl.simple.pre=%3Cb%3Ehl.simple.post=%3C%2Fb%3Ehl.mergeContiguous=false

 It involves highlighting on a multivalued field with more than 600 short
 values inside. It takes 200 or 300 ms because of highlighting.

 After restarting tomcat all went fine again.

 I'm trying to understand why I had to restart tomcat and solr and what
 should I do to have it working 7/7 24/24.

 Xavier






-- 
Lance Norskog
goks...@gmail.com


Re: Sorting and filtering on fluctuating multi-currency price data?

2010-10-29 Thread Lance Norskog
ExternalFileField can only be used for boosting. It is not a
first-class field.

On Thu, Oct 28, 2010 at 11:07 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : Another approach would be to use ExternalFileField and keep the price data,
 : normalized to USD, outside of the index. Every time the currency rates
 : changed, we would calculate new normalized prices for every document in the
 : index.

 ...that is the approach i would normally suggest.

 : Still another approach would be to do the currency conversion at IndexReader
 : warmup time. We would index native price and currency code and create a
 : normalized currency field on the fly. This would be somewhat like
 : ExternalFileField in that it involved data from outside the index, but it
 : wouldn't need to be scoped to the parent SolrIndexReader, but could be
 : per-segment. Perhaps a custom poly-field could accomplish something like
 : this?

 ...that would essentially be what ExternalFileFiled should start doing, it
 just hasn't had anyone bite the bullet to implement it yet -- if you wnat
 to tackle that, then i would suggest/request/encourage you to look at
 doing it as a patch to ExternalFileField that could be contributed back
 and reused by all.

 With all of that said: there has also been a recent contribution of a
 MoneyFieldType for dealing precisesly with multicurrency
 sorting/filtering issues that you should definitley take a look at...

 https://issues.apache.org/jira/browse/SOLR-2202

 -Hoss




-- 
Lance Norskog
goks...@gmail.com


Re: Looking for Developers

2010-10-29 Thread Lance Norskog
Then, Godwin!

On Fri, Oct 29, 2010 at 3:04 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 On Fri, 2010-10-29 at 10:06 +0200, Mark Allan wrote:
 For me, I simply deleted the original email, but I'm now quite
 enjoying the irony of the complaints causing more noise on the list
 than the original email!  ;-)

 He he. An old classic. Next in line is the meta-meta-discussion about
 whether meta-discussions belong on the list or if they should be moved
 to solr-user-meta. Repeat ad nauseam.

 Job-postings are on-topic IMHO and unless their volume grows
 significantly, I see no reason to create a new mail lists.





-- 
Lance Norskog
goks...@gmail.com


Re: No response from Solr on complex request after several days

2010-10-29 Thread Xavier Schepler

On 29/10/2010 12:08, Lance Norskog wrote:

There are a few problems that can happen. This is usually a sign of
garbage collection problems.
You can monitor the Tomcat instance with JConsole or one of the other
java monitoring tools and see if there is a memory leak.

Also, most people don't need to do it, but you can automatically
restart it once a day.

On Thu, Oct 28, 2010 at 2:20 AM, Xavier Schepler
xavier.schep...@sciences-po.fr  wrote:
   

Hi,

We are in a beta testing phase, with several users a day.

After several days of waiting, the solr server didn't respond to requests
that require a lot of processing time.

I'm using Solr inside Tomcat.

This is the request that had no response from the server :

wt=jsonomitHeader=trueq=qiAndMSwFR%3A%28transport%29q.op=ANDstart=0rows=5fl=id,domainId,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,sort=score%20descfq=solrLangCode%3AFRfacet=truefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DdomainIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyDecadefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudySerieIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyYearAndDescriptionIdfacet.sort=countf.studyDecade.facet.sort=lexspellcheck=truespellcheck.count=10spellcheck.dictionary=qiAndMFRspellcheck.q=transporthl=onhl.fl=qSwFR,iHLSwFR,mHLSwFRhl.fragsize=0hl.snippets=1hl.usePhraseHighlighter=truehl.highlightMultiTerm=truehl.simple.pre=%3Cb%3Ehl.simple.post=%3C%2Fb%3Ehl.mergeContiguous=false

It involves highlighting on a multivalued field with more than 600 short
values inside. It takes 200 or 300 ms because of highlighting.

After restarting tomcat all went fine again.

I'm trying to understand why I had to restart tomcat and solr and what
should I do to have it working 7/7 24/24.

Xavier



 



   

Thanks for your response.
Today, I've increased the Tomcat JVM heap size from 128-256 to 
1024-2048. I will see if it helps.





Re: RAM increase

2010-10-29 Thread Lance Norskog
When you start the Tomcat app, you tell it how much memory to allocate
to the JVM. I don't remember where, probably in catalina.sh.

On Fri, Oct 29, 2010 at 2:56 AM, satya swaroop satya.yada...@gmail.com wrote:
 Hi All,

         Thanks for your reply.I have a doubt whether to increase the ram or
 heap size to java or to tomcat where the solr is running


 Regards,
 satya




-- 
Lance Norskog
goks...@gmail.com


Re: QueryElevation Component is so slow

2010-10-29 Thread Lance Norskog
I do not know if this is accurate. There are direct tools to monitor
these problems: jconsole, visualgc/visualvm, YourKit, etc. Often these
counts allot many things to one place that should be spread out.

On Fri, Oct 29, 2010 at 12:27 AM, Chamnap Chhorn
chamnapchh...@gmail.com wrote:
 anyone has some suggestions to improve the search?
 thanks

 On 10/28/10, Chamnap Chhorn chamnapchh...@gmail.com wrote:
 Sorry for very bad pasting. I paste it again.

 Slowest Components                                      Count   Exclusive
      Total
 QueryElevationComponent                                 1     506,858 ms
 100%
 506,858 ms 100%
 SolrIndexSearcher                                         1     2.0 ms
  0%     2.0 ms     0%
 org.apache.solr.servlet.SolrDispatchFilter.doFilter()     1     1.0 ms
  0%     506,862 ms 100%
 QueryComponent                                             1     1.0 ms
  0%   1.0 ms     0%
 DebugComponent                                             1     0.0 ms
  0%     0.0 ms     0%
 FacetComponent                                             1     0.0 ms
  0%     0.0 ms     0%

 On Thu, Oct 28, 2010 at 4:57 PM, Chamnap Chhorn
 chamnapchh...@gmail.comwrote:

 Hi,

 I'm using solr 1.4 and using QueryElevation Component for guaranteed
 search
 position. I have around 700,000 documents with 1 Mb elevation file. It
 turns
 out it is quite slow on the newrelic monitoring website:

 Slowest Components Count Exclusive Total   QueryElevationComponent 1
 506,858 ms 100% 506,858 ms 100% SolrIndexSearcher 1 2.0 ms 0% 2.0 ms 0%
 org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms 0% 506,862
 ms
 100% QueryComponent 1 1.0 ms 0% 1.0 ms 0% DebugComponent 1 0.0 ms 0% 0.0
 ms
 0% FacetComponent 1 0.0 ms 0% 0.0 ms 0%

 As you could see, QueryElevationComponent takes quite a lot of time. Any
 suggestion how to improve this?

 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/




 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/



 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/




-- 
Lance Norskog
goks...@gmail.com


Influencing scores on values in multiValue fields

2010-10-29 Thread Imran
Hi All

We've got an index in which we have a multiValued field per document.

Assume the multivalue field values in each document to be;

Doc1:
bar lifters

Doc2:
truck tires
back drops
bar lifters

Doc 3:
iron bar lifters

Doc 4:
brass bar lifters
iron bar lifters
tire something
truck something
oil gas

Now when we search for 'bar lifters' the expectation (based on the
requirements) is that we get results in the order of Doc1, Doc 2, Doc4 and
Doc3.
Doc 1 - since there's an exact match (and only one) for the search terms
Doc 2 - since ther'e an exact match amongst the values
Doc 4 - since there's a partial match on the values but the number of
matches are more than Doc 3
Doc 3 - since there's a partial match

However, the results come out as Doc1, Doc3, Doc2, Doc4. Looking at the
explaination of the result it appears Doc 2 is loosing to Doc3 and Doc 4 is
loosing to Doc3 based on length normalisation.

We think we can see the reason for that - the field length in doc2 is
greater than doc3 and doc 4 is greater doc3.
However, is there any mechanism I can force doc2 to beat doc3 and doc4 to
beat doc3 with this structure.

We did look at using omitNorms=true, but that messes up the scores for all
docs. The result comes out as Doc4, Doc1, Doc2, Doc3 (where Doc1, Doc2 and
Doc3 gets the same score)
This is because the fieldNorm is not taken into account anymore (as
expected) and the termFrequence being the only contributing factor. So
trying to avoid length normalisation through omitNorms is not helping.

Is there anyway where we can influence an exact match of a value in a
multiValue field to add on to the overall score whilst keeping the lenght
normalisation?

Hope that makes sense.

Cheers
-- Imran


Re: Exception while processing: attach document

2010-10-29 Thread Bac Hoang

 Could any one shed a light please?

I saw in the log a message as below, but I don't think it's the root 
cause, because my dataSrouce, the readOnly is true


Caused by: java.sql.SQLException: READ_COMMITTED and SERIALIZABLE are 
the only valid transaction levels


A newbie Solr user

=

On 10/29/2010 1:49 PM, Bac Hoang wrote:

Hello all,

I'm getting stuck when trying to import oracle DB to solr index, could 
any one of you give a hand. Thanks million.


Below is some short info. that might be a question

My Sorl: 1.4.1

 *LOG *
INFO: Starting Full Import
Oct 29, 2010 1:19:35 PM org.apache.solr.handler.dataimport.SolrWriter 
readIndexerProperties

INFO: Read dataimport.properties
Oct 29, 2010 1:19:35 PM 
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity attach with URL: 
jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22
Oct 29, 2010 1:19:36 PM org.apache.solr.handler.dataimport.DocBuilder 
buildDocument
*SEVERE: Exception while processing: attach document *: 
SolrInputDocument[{}]
org.apache.solr.handler.dataimport.DataImportHandlerException: *Unable 
to execute query: *select * from A.B Processing Document # 1


where A: a schema
B: a table

 *dataSource *===
dataSource name=jdbc driver=oracle.jdbc.driver.OracleDriver
 url=jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22 user=abc 
password=xyz

 readOnly=true autoCommit=false batchSize=1/
document
entity dataSource=jdbc name=attach query=select * from A.B
entity processor=SqlEntityProcessor dataField=attach.TOPIC 
format=text

field column=text name=text /
/entity
/entity
/document

where TOPIC is a filed of table B

Thanks again



RE: Influencing scores on values in multiValue fields

2010-10-29 Thread Michael Sokolov
How about creating another field for doing exact matches (a string);
searching both and boosting the string match?

-Mike 

 -Original Message-
 From: Imran [mailto:imranboho...@gmail.com] 
 Sent: Friday, October 29, 2010 6:25 AM
 To: solr-user@lucene.apache.org
 Subject: Influencing scores on values in multiValue fields
 
 Hi All
 
 We've got an index in which we have a multiValued field per document.
 
 Assume the multivalue field values in each document to be;
 
 Doc1:
 bar lifters
 
 Doc2:
 truck tires
 back drops
 bar lifters
 
 Doc 3:
 iron bar lifters
 
 Doc 4:
 brass bar lifters
 iron bar lifters
 tire something
 truck something
 oil gas
 
 Now when we search for 'bar lifters' the expectation (based on the
 requirements) is that we get results in the order of Doc1, 
 Doc 2, Doc4 and Doc3.
 Doc 1 - since there's an exact match (and only one) for the 
 search terms Doc 2 - since ther'e an exact match amongst the 
 values Doc 4 - since there's a partial match on the values 
 but the number of matches are more than Doc 3 Doc 3 - since 
 there's a partial match
 
 However, the results come out as Doc1, Doc3, Doc2, Doc4. 
 Looking at the explaination of the result it appears Doc 2 is 
 loosing to Doc3 and Doc 4 is loosing to Doc3 based on length 
 normalisation.
 
 We think we can see the reason for that - the field length in 
 doc2 is greater than doc3 and doc 4 is greater doc3.
 However, is there any mechanism I can force doc2 to beat doc3 
 and doc4 to beat doc3 with this structure.
 
 We did look at using omitNorms=true, but that messes up the 
 scores for all docs. The result comes out as Doc4, Doc1, 
 Doc2, Doc3 (where Doc1, Doc2 and
 Doc3 gets the same score)
 This is because the fieldNorm is not taken into account anymore (as
 expected) and the termFrequence being the only contributing 
 factor. So trying to avoid length normalisation through 
 omitNorms is not helping.
 
 Is there anyway where we can influence an exact match of a 
 value in a multiValue field to add on to the overall score 
 whilst keeping the lenght normalisation?
 
 Hope that makes sense.
 
 Cheers
 -- Imran
 



Re: Reverse range query

2010-10-29 Thread kenf_nc

I modified the text of this hopefully to make it clearer. I wasn't sure what
I was asking was coming across well. And I'm adding this comment in a
shameless attempt to boost my question back to the top for people to see.
Before I write a messy work around, just wanted to check the community to
see if this was already handled, it seems like a useful, common, data type.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Reverse-range-query-tp1789135p1792126.html
Sent from the Solr - User mailing list archive at Nabble.com.


eDismax result differs from Dismax

2010-10-29 Thread Ryan Walker

We are launching a new version of our job board helping returning veterans find 
a civilian job, and we chose Solr and Sunspot[1] to power our search. We really 
didn't consider the power users in the HR world who are trained to use boolean 
search, for example:

Engineer AND (Electrical OR Mechanical)

Sunspot supports the Dismax request handler, which unfortunately does not 
handle the query above properly. So we read about eDismax and that it was baked 
into Solr 1.5. At the same time, Sunspot has switched from LocalSolr 
integration to storing a geohash in a full-text searchable field.

We're having some problems with some complex queries that Sunspot generates:

INFO: [] webapp=/solr path=/select 
params={fl=+scorestart=0q=query:{!dismax+qf%3D'title_text+description_text'}Ruby+on+Rails+Developer+(location_details_s:dngythdb25fu^1.0+OR+location_details_s:dngythdb25f^0.0625+OR+location_details_s:dngythdb25*^0.00391+OR+location_details_s:dngythdb2*^0.000244+OR+location_details_s:dngythdb*^0.153+OR+location_details_s:dngythd*^0.00954+OR+location_details_s:dngyth*^0.000596+OR+location_details_s:dngyt*^0.373+OR+location_details_s:dngy*^0.0233+OR+location_details_s:dng*^0.00146)wt=rubyfq=type:JobdefType=edismaxrows=20}
 hits=1 status=0 QTime=13

Under Dismax no results are returned for this query, however, as you can see 
above with eDismax a result is returned -- the only difference between the two 
queries are 'defType=edismax' vs 'defType=dismax'

Debug Output Solr 1.5 eDismax:
https://gist.github.com/32f3a52064ec300fdca0

Debug Output Solr 1.5 Dismax:
https://gist.github.com/d82b82a026878ecce36b

My question is if you have any ideas why the query above returns a record that 
doesn't match, in eDismax?

We are at a crossroads where we have to decide if we want to forge ahead with 
Sunspot 1.2rc4 and Solr 1.5, or we may fall back to Sunspot 1.1 and Solr 1.4 
until Solr 3.1/4.0 come out, hopefully with eDismax support and better location 
search support.

I plan to do a blog posting on this issue when we figure it out, I'll give you 
props if you can help us out :)

Best regards,

Ryan Walker
Chief Experience Officer
http://www.recruitmilitary.com
513.677.7078
Best regards,

[1] http://outoftime.github.com/sunspot/

Re: eDismax result differs from Dismax

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 9:30 AM, Ryan Walker r...@recruitmilitary.com wrote:

 We are launching a new version of our job board helping returning veterans 
 find a civilian job, and we chose Solr and Sunspot[1] to power our search. We 
 really didn't consider the power users in the HR world who are trained to use 
 boolean search, for example:

 Engineer AND (Electrical OR Mechanical)

 Sunspot supports the Dismax request handler, which unfortunately does not 
 handle the query above properly. So we read about eDismax and that it was 
 baked into Solr 1.5. At the same time, Sunspot has switched from LocalSolr 
 integration to storing a geohash in a full-text searchable field.

 We're having some problems with some complex queries that Sunspot generates:

 INFO: [] webapp=/solr path=/select 
 params={fl=+scorestart=0q=query:{!dismax+qf%3D'title_text+description_text'}Ruby+on+Rails+Developer+(location_details_s:dngythdb25fu^1.0+OR+location_details_s:dngythdb25f^0.0625+OR+location_details_s:dngythdb25*^0.00391+OR+location_details_s:dngythdb2*^0.000244+OR+location_details_s:dngythdb*^0.153+OR+location_details_s:dngythd*^0.00954+OR+location_details_s:dngyth*^0.000596+OR+location_details_s:dngyt*^0.373+OR+location_details_s:dngy*^0.0233+OR+location_details_s:dng*^0.00146)wt=rubyfq=type:JobdefType=edismaxrows=20}
  hits=1 status=0 QTime=13

 Under Dismax no results are returned for this query, however, as you can see 
 above with eDismax a result is returned -- the only difference between the 
 two queries are 'defType=edismax' vs 'defType=dismax'

That's to be expected.  Dismax doesn't even support fielded queries
(where you specify the fieldname in the query itself) so this clause
is treated all as text:

(location_details_s:dngythdb25fu^1.0

and dismax QP will be looking for tokens like location_details_s
dngythdb25fu (assuming tokenization would split on the
non-alphanumeric chars) in your text fields.

-Yonik
http://www.lucidimagination.com


Re: Maximum of length of a Dismax Query?

2010-10-29 Thread Swapnonil Mukherjee
Solved this issue, by  setting the maxHttpHeaderSize to 65536 in 
tomcat/conf/server.xml file.

Otherwise Tomcat was not responding.

Swapnonil Mukherjee



On 29-Oct-2010, at 2:43 PM, Swapnonil Mukherjee wrote:

I am using the SOLRJ client to post my query, The query length is roughly 
10,000 characters. I am using GET like this.

int page = 1;
   int resultsPerPage = 24;
   ModifiableSolrParams params = new ModifiableSolrParams();
   params.set(q, query);
   params.set(start,  + (page - 1) * resultsPerPage);
   params.set(rows, resultsPerPage);
   try
   {
   QueryResponse response = 
QueryServerManager.getSolrServer().query(params, SolrRequest.METHOD.GET);
   assertNotNull(response);
   }
   catch (SolrServerException e)
   {
   e.printStackTrace();
   }
This hits the exception block with the following exception

org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: 
Connection reset
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:122)
at 
com.getty.search.tests.DismaxQueryTestCase.testAssetQuery(DismaxQueryTestCase.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at junit.framework.TestCase.runTest(TestCase.java:154)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.textui.TestRunner.doRun(TestRunner.java:116)
at com.intellij.junit3.JUnit3IdeaTestRunner.doRun(JUnit3IdeaTestRunner.java:108)
at junit.textui.TestRunner.doRun(TestRunner.java:109)
at 
com.intellij.junit3.JUnit3IdeaTestRunner.startRunnerWithArgs(JUnit3IdeaTestRunner.java:42)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:192)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64)

Swapnonil Mukherjee



On 29-Oct-2010, at 12:44 PM, Swapnonil Mukherjee wrote:

Hi Everybody,

It seems that the maximum query length supported by the Dismax Query Handler is 
3534 characters. Is there anyway I can set this limit to be around 12,000?

If I fire a query beyond 3534 characters, I don't even get error messages in 
the catalina.XXX log files.

Swapnonil Mukherjee
+91-40092712
+91-9007131999







Re: QueryElevation Component is so slow

2010-10-29 Thread Chamnap Chhorn
Thanks for reply.

I'm looking for how to improve the speed of the search query. The
QueryElevation Component is taking too much time which is
unacceptable. The size of elevation file is only 1 Mb. I wonder other
people using this component without problems (related to speed)? Am I
using it the wrong way or there is a limit when using this component?

On 10/29/10, Lance Norskog goks...@gmail.com wrote:
 I do not know if this is accurate. There are direct tools to monitor
 these problems: jconsole, visualgc/visualvm, YourKit, etc. Often these
 counts allot many things to one place that should be spread out.

 On Fri, Oct 29, 2010 at 12:27 AM, Chamnap Chhorn
 chamnapchh...@gmail.com wrote:
 anyone has some suggestions to improve the search?
 thanks

 On 10/28/10, Chamnap Chhorn chamnapchh...@gmail.com wrote:
 Sorry for very bad pasting. I paste it again.

 Slowest Components                                      Count   Exclusive
      Total
 QueryElevationComponent                                 1     506,858 ms
 100%
 506,858 ms 100%
 SolrIndexSearcher                                         1     2.0 ms
  0%     2.0 ms     0%
 org.apache.solr.servlet.SolrDispatchFilter.doFilter()     1     1.0 ms
  0%     506,862 ms 100%
 QueryComponent                                             1     1.0 ms
  0%   1.0 ms     0%
 DebugComponent                                             1     0.0 ms
  0%     0.0 ms     0%
 FacetComponent                                             1     0.0 ms
  0%     0.0 ms     0%

 On Thu, Oct 28, 2010 at 4:57 PM, Chamnap Chhorn
 chamnapchh...@gmail.comwrote:

 Hi,

 I'm using solr 1.4 and using QueryElevation Component for guaranteed
 search
 position. I have around 700,000 documents with 1 Mb elevation file. It
 turns
 out it is quite slow on the newrelic monitoring website:

 Slowest Components Count Exclusive Total   QueryElevationComponent 1
 506,858 ms 100% 506,858 ms 100% SolrIndexSearcher 1 2.0 ms 0% 2.0 ms 0%
 org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms 0%
 506,862
 ms
 100% QueryComponent 1 1.0 ms 0% 1.0 ms 0% DebugComponent 1 0.0 ms 0% 0.0
 ms
 0% FacetComponent 1 0.0 ms 0% 0.0 ms 0%

 As you could see, QueryElevationComponent takes quite a lot of time. Any
 suggestion how to improve this?

 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/




 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/



 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/




 --
 Lance Norskog
 goks...@gmail.com



-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


RE: Natural string sorting

2010-10-29 Thread Bob Sandiford
Well, you could do a magnitude notation approach.  Depends on how complex the 
strings are, but based on your examples, this would work:

1) Identify a series of integers in the string.  (This assumes lengths are no 
more than 9 for each series).

2) Insert the number of integers into the string before the integer series 
itself

So - for sorting - you would have:

string1 -- string11
string10 -- string210
string2 -- string12

which will then sort as string11, string12, string210, but use the original 
strings as the displays you want.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 

 -Original Message-
 From: Savvas-Andreas Moysidis
 [mailto:savvas.andreas.moysi...@googlemail.com]
 Sent: Friday, October 29, 2010 4:33 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Natural string sorting
 
 I think string10 is before string2 in lexicographic order?
 
 On 29 October 2010 09:18, RL rl.subscri...@gmail.com wrote:
 
 
  Just a quick question about natural sorting of strings.
 
  I've a simple dynamic field in my schema:
 
  fieldType name=string class=solr.StrField sortMissingLast=true
  omitNorms=true/
  field name=nameSort_en type=string indexed=true stored=false
  omitNorms=true/
 
  There are 3 indexed strings for example
  string1,string2,string10
 
  Executing a query and sorting by this field leads to unnatural
 sorting of :
  string1
  string10
  string2
 
  (Some time ago i used Lucene and i was pretty sure that Lucene used a
  natural sort, thus i expected the same from solr)
  Is there a way to sort in a natural order? Config option? Plugin?
 Expected
  output would be:
  string1
  string2
  string10
 
 
  Thanks in advance.
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Natural-string-sorting-
 tp1791227p1791227.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 


RE: spellchecker results not as desired

2010-10-29 Thread Dyer, James
You should be building your index on a field that creates tokens on whitespace. 
 So your dictionary would have iphone and case as separate terms instead of 
iphone case as one term.  And if you query on something like iphole case, 
it will give suggestions for iphole but not for case because the later is 
in the dictionary.  (The spellchecker will always assume a term is correctly 
spelled if it is in the Dictionary).

If you set collate=true, in addition to getting word-by-word suggestions, it 
will return a re-written query (aka a collation).  SOLR 1.4 wil always use 
the top suggestions for each word to form the collation.  In this example, the 
collation would be iphone case.  You can then requery SOLR with the collation 
and hope to get better hits.  While 1.4 doesn't check to see if the collation 
is going to return any hits, an enhancement to 3.x and 4.0 allows you to 
guarantee that collations will always give you hits if you requery them.

As for your second question, likely ipj is close enough to ipad to warrant 
a suggestion but the others are not considered close enough.  You can tweak 
this by setting spellcheck.accuracy.  However, I do not believe this option is 
available in 1.4.  The wiki indicates it is 3.x/4.0 only.

For more information, look at the SpellCheckComponent page on the wiki.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: abhayd [mailto:ajdabhol...@hotmail.com] 
Sent: Thursday, October 28, 2010 4:34 PM
To: solr-user@lucene.apache.org
Subject: spellchecker results not as desired


hi 

I added spellchecker to request handler. Spellchecker is indexed based.
Terms in index are like
iphone
iphone 4
iphone case
phone
gophoe

when i set q=iphole i get suggestions like
iphone
phone
gophone
ipad

Not sure how would i get iphone, iphone 4, iphone case, phone. Any thoughts?

At the same time when i type ipj
i get result as ipad, why not iphone, iphone 4 , ipad
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/spellchecker-results-not-as-desired-tp1789192p1789192.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: RAM increase

2010-10-29 Thread Tommaso Teofili
Hello Lance,
form the command line run:

 export JAVA_OPTS='-d64 -Xms128m -Xmx5g'

eventually changing values of Xms and Xmx.
Hope this helps.
Tommaso

2010/10/29 Lance Norskog goks...@gmail.com

 When you start the Tomcat app, you tell it how much memory to allocate
 to the JVM. I don't remember where, probably in catalina.sh.

 On Fri, Oct 29, 2010 at 2:56 AM, satya swaroop satya.yada...@gmail.com
 wrote:
  Hi All,
 
  Thanks for your reply.I have a doubt whether to increase the ram
 or
  heap size to java or to tomcat where the solr is running
 
 
  Regards,
  satya
 



 --
 Lance Norskog
 goks...@gmail.com



Something for the weekend - Lily 0.2 is OUT ! :)

2010-10-29 Thread Steven Noels
Dear all,

three months after the highly anticipated proof of architecture release,
we're living up to our promises, and are releasing Lily 'CR' 0.2 today - a
fully-distributed, highly scalable and highly available content repository,
marrying best-of-breed database and search technology into a powerful,
productive and easy-to-use solution for contemporary internet-scale content
applications.
For whom

You're building content applications (content management, archiving, asset
management, DMS, WCMS, portals, ...) that scale well, either as a product, a
project or in the cloud. You need a trustworthy underlying content
repository that provides a flexible and easy-to-use content model you can
adapt to your requirements. You have a keen interest in NoSQL/HBase
technology but needs a higher-level API, and scalable indexing and search as
well.
Foundations

Lily builds further upon Apache HBase and Apache SOLR. HBase is a faithful
implementation of the Google BigTable database, and provides infinite
elastic scaling and high-performance access to huge amounts of data. SOLR is
the server version of Lucene, the industry-standard search library. Lily
joins HBase and SOLR in a single, solidly packaged content repository
product, with automated sharding (making use of multiple hardware nodes to
provide scaling of volume and performance) and automatic index maintenance.
Lily adds a sophisticated, yet flexible and surprisingly practical content
schema on top of this, providing the structuredness of more classic
databases, versioning, secondary indexing, queuing: all the stuff developers
care for when fixing real-world problems.
Key features of this release

   - Fully distributed: Lily has a fully-distributed architecture making
   maximum use of all available hardware for scalability and availability.
   ZooKeeper is used for distributed process coordination, configuration and
   locking. Index maintenance is based on an HBase-backed RowLog mechanism
   allowing fast but reliable updating of SOLR indexes.
   - Index maintenance: Lily offers all the features and functionality of
   SOLR, but makes index maintenance a breeze, both for interactive as-you-go
   updating and MapReduce-based full index rebuilds
   - Multi-indexers: for high-load situations, multiple indexers can work in
   parallel and talk to a sharded SOLR setup
   - REST interface: a flexible and platform-neutral access method for all
   Lily operations using HTTP and JSON
   - Improved content model: we added URI as a base Lily type as a (small)
   indication of our interest in semantic technology

More importantly, we commit ourselves to take care of API compatibility and
data format layout from this release onwards - as much as humanly possible.
Lily 0.2 offers the API we want to support in the final release. Lily 0.2 is
our contract for content application developers, upgrading to Lily final
should require them to do as little code or data changes as possible.
From where

Download Lily from www.lilyproject.org. It's Apache Licensed Open Source. No
strings attached.
Enterprise support

Together with this release, we're rolling out our commercial support
services http://outerthought.org/site/services/lily.html (and signed up a
first customer, yay!) that allows you to use Lily with peace of mind. Also,
this release has been fully tested and depends on the latest Cloudera
Distribution for Hadoop http://www.cloudera.com/hadoop/ (CDH3 beta3).
Next up

Lily 1.0 is planned for March 2011, with an interim release candidate in
January. We'll be working on performance enhancements, feature additions,
and are happily - eagerly - awaiting your feedback and comments. We'll post
a roadmap for Lily 0.3 and onwards by mid November.
Follow us

If you want to keep track of Lily's on-going development, join the Lily
discussion list or follow our company Twitter
@outerthoughthttp://twitter.com/#%21/outerthought
.
Thank you

I'd like to thank Bruno and Evert for their hard work so far, the HBase and
SOLR community for their help, the IWT government fund for their partial
financial support, and all of our early Lily adopters and enthusiasts for
their much valued feedback. You guys rock!

Steven.
-- 
Steven Noels
http://outerthought.org/
Open Source Content Applications
Makers of Kauri, Daisy CMS and Lily


Re: Exception while processing: attach document

2010-10-29 Thread Tommaso Teofili
I think this is a JDBC warning message since some isolation levels may not
be implemented in the actual (Oracle) implementation (e.g.:
READ_UNCOMMITTED).
May your issue be related to some transactions updating/inserting/deleting
records on your Oracle DB while trying to run DIH?
Regards,
Tommaso

2010/10/29 Bac Hoang bac.ho...@axonactive.vn

  Could any one shed a light please?

 I saw in the log a message as below, but I don't think it's the root cause,
 because my dataSrouce, the readOnly is true

 Caused by: java.sql.SQLException: READ_COMMITTED and SERIALIZABLE are the
 only valid transaction levels

 A newbie Solr user

 =


 On 10/29/2010 1:49 PM, Bac Hoang wrote:

 Hello all,

 I'm getting stuck when trying to import oracle DB to solr index, could any
 one of you give a hand. Thanks million.

 Below is some short info. that might be a question

 My Sorl: 1.4.1

  *LOG *
 INFO: Starting Full Import
 Oct 29, 2010 1:19:35 PM org.apache.solr.handler.dataimport.SolrWriter
 readIndexerProperties
 INFO: Read dataimport.properties
 Oct 29, 2010 1:19:35 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 call
 INFO: Creating a connection for entity attach with URL:
 jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22
 Oct 29, 2010 1:19:36 PM org.apache.solr.handler.dataimport.DocBuilder
 buildDocument
 *SEVERE: Exception while processing: attach document *:
 SolrInputDocument[{}]
 org.apache.solr.handler.dataimport.DataImportHandlerException: *Unable to
 execute query: *select * from A.B Processing Document # 1
 
 where A: a schema
 B: a table

  *dataSource *===
 dataSource name=jdbc driver=oracle.jdbc.driver.OracleDriver
  url=jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22 user=abc
 password=xyz
  readOnly=true autoCommit=false batchSize=1/
 document
 entity dataSource=jdbc name=attach query=select * from A.B
 entity processor=SqlEntityProcessor dataField=attach.TOPIC
 format=text
 field column=text name=text /
 /entity
 /entity
 /document
 
 where TOPIC is a filed of table B

 Thanks again




Re: Multiple indexes inside a single core

2010-10-29 Thread Valli Indraganti
Here's the Jira issue for the distributed search issue.
https://issues.apache.org/jira/browse/SOLR-1632

I tried applying this patch but, get the same error that is posted in the
discussion section for that issue. I will be glad to help too on this one.

On Sat, Oct 23, 2010 at 2:35 PM, Erick Erickson erickerick...@gmail.comwrote:

 Ah, I should have read more carefully...

 I remember this being discussed on the dev list, and I thought there might
 be
 a Jira attached but I sure can't find it.

 If you're willing to work on it, you might hop over to the solr dev list
 and
 start
 a discussion, maybe ask for a place to start. I'm sure some of the devs
 have
 thought about this...

 If nobody on the dev list says There's already a JIRA on it, then you
 should
 open one. The Jira issues are generally preferred when you start getting
 into
 design because the comments are preserved for the next person who tries
 the idea or makes changes, etc

 Best
 Erick

 On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess ben.bogg...@gmail.com
 wrote:

  Thanks Erick.  The problem with multiple cores is that the documents are
  scored independently in each core.  I would like to be able to search
 across
  both cores and have the scores 'normalized' in a way that's similar to
 what
  Lucene's MultiSearcher would do.  As far a I understand, multiple cores
  would likely result in seriously skewed scores in my case since the
  documents are not distributed evenly or randomly.  I could have one
  core/index with 20 million docs and another with 200.
 
  I've poked around in the code and this feature doesn't seem to exist.  I
  would be happy with finding a decent place to try to add it.  I'm not
 sure
  if there is a clean place for it.
 
  Ben
 
  On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   It seems to me that multiple cores are along the lines you
   need, a single instance of Solr that can search across multiple
   sub-indexes that do not necessarily share schemas, and are
   independently maintainable..
  
   This might be a good place to start:
  http://wiki.apache.org/solr/CoreAdmin
  
   HTH
   Erick
  
   On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com
  wrote:
  
   We are trying to convert a Lucene-based search solution to a
   Solr/Lucene-based solution.  The problem we have is that we currently
  have
   our data split into many indexes and Solr expects things to be in a
  single
   index unless you're sharding.  In addition to this, our indexes
 wouldn't
   work well using the distributed search functionality in Solr because
 the
   documents are not evenly or randomly distributed.  We are currently
  using
   Lucene's MultiSearcher to search over subsets of these indexes.
  
   I know this has been brought up a number of times in previous posts
 and
  the
   typical response is that the best thing to do is to convert everything
  into
   a single index.  One of the major reasons for having the indexes split
  up
   the way we do is because different types of data need to be indexed at
   different intervals.  You may need one index to be updated every 20
  minutes
   and another is only updated every week.  If we move to a single index,
  then
   we will constantly be warming and replacing searchers for the entire
   dataset, and will essentially render the searcher caches useless.  If
 we
   were able to have multiple indexes, they would each have a searcher
 and
   updates would be isolated to a subset of the data.
  
   The other problem is that we will likely need to shard this large
 single
   index and there isn't a clean way to shard randomly and evenly across
  the
   of
   the data.  We would, however like to shard a single data type.  If we
  could
   use multiple indexes, we would likely be also sharding a small sub-set
  of
   them.
  
   Thanks in advance,
  
   Ben
  
 



Re: Stored or indexed?

2010-10-29 Thread Elizabeth L. Murnane
Hi Ron,

In a nutshell - an indexed field is searchable, and a stored field has its 
content stored in the index so it is retrievable. Here are some examples that 
will hopefully give you a feel for how to set the indexed and stored options:

indexed=true stored=true
Use this for information you want to search on and also display in search 
results - for example, book title or author.

indexed=false stored=true
Use this for fields that you want displayed with search results but that don't 
need to be searchable - for example, destination URL, file system path, time 
stamp, or icon image.

indexed=true stored=false
Use this for fields you want to search on but don't need to get their values in 
search results. Here are some of the common reasons you would want this:

Large fields and a database: Storing a field makes your index larger, so set 
stored to false when possible, especially for big fields. For this case a 
database is often used, as the previous responder said. Use a separate 
identifier field to get the field's content from the database.

Ordering results: Say you define field name=bookName type=text 
indexed=true stored=true that is tokenized and used for searching. If you 
want to sort results based on book name, you could copy the field into a 
separate nonretrievable, nontokenized field that can be used just for sorting -
field name=bookSort type=string indexed=true stored=false
copyField source=bookName dest=bookSort

Easier searching: If you define the field field name=text type=text 
indexed=true stored=false multiValued=true/ you can use it as a 
catch-all field that contains all of the other text fields. Since solr looks in 
a default field when given a text query without field names, you can support 
this type of general phrase query by making the catch-all the default field.

indexed=false stored=false
Use this when you want to ignore fields. For example, the following will ignore 
unknown fields that don't match a defined field rather than throwing an error 
by default.
fieldtype name=ignored stored=false indexed=false
dynamicField name=* type=ignored


Elizabeth Murnane
emurn...@architexa.com
Architexa Lead Developer - www.architexa.com
Understand  Document Code In Seconds


--- On Thu, 10/28/10, Savvas-Andreas Moysidis 
savvas.andreas.moysi...@googlemail.com wrote:

From: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com
Subject: Re: Stored or indexed?
To: solr-user@lucene.apache.org
Date: Thursday, October 28, 2010, 4:25 AM

In our case, we just store a database id and do a secondary db query when
displaying the results.
This is handy and leads to a more centralised architecture when you need to
display properties of a domain object which you don't index/search.

On 28 October 2010 05:02, kenf_nc ken.fos...@realestate.com wrote:


 Interesting wiki link, I hadn't seen that table before.

 And to answer your specific question about indexed=true, stored=false, this
 is most often done when you are using analyzers/tokenizers on your field.
 This field is for search only, you would never retrieve it's contents for
 display. It may in fact be an amalgam of several fields into one 'content'
 field. You have your display copy stored in another field marked
 indexed=false, stored=true and optionally compressed. I also have simple
 string fields set to lowercase so searching is case-insensitive, and have a
 duplicate field where the string is normal case. the first one is
 indexed/not stored, the second is stored/not indexed.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Stored-or-indexed-tp1782805p1784315.html
 Sent from the Solr - User mailing list archive at Nabble.com.



How can I disable fsync()?

2010-10-29 Thread Igor Chudov
Thanks to all and I made Solr work very well on one newer machine.

Now I am setting up Solr on an older server with an IDE hard drive.

Unfortunately, populating the index takes FOREVER due to
Solr/Lucene/Tomcat calling fsync() a lot after every write.

I would like to know how to disable fsync.

I am very aware of the risks of not having fsync() and I DO NOT CARE
ABOUT THEM AND DO NOT WANT TO BE REMINDED.

I just want to know how can I disable fsync() when adding to Solr index.

Thanks, guys!

Igor


Re: documentCache clarification

2010-10-29 Thread Jay Luker
On Thu, Oct 28, 2010 at 7:27 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 The queryResultCache is keyed on Query,Sort,Start,Rows,Filters and the
 value is a DocList object ...

 http://lucene.apache.org/solr/api/org/apache/solr/search/DocList.html

 Unlike the Document objects in the documentCache, the DocLists in the
 queryResultCache never get modified (techincally Solr doesn't actually
 modify the Documents either, the Document just keeps track of it's fields
 and updates itself as Lazy Load fields are needed)

 if a DocList containing results 0-10 is put in the cache, it's not
 going to be of any use for a query with start=50.  but if it contains 0-50
 it *can* be used if start  50 and rows  50 -- that's where the
 queryResultWindowSize comes in.  if you use start=0rows=10, but your
 window size is 50, SolrIndexSearcher will (under the covers) use
 start=0rows=50 and put that in the cache, returning a slice from 0-10
 for your query.  the next query asking for 10-20 will be a cache hit.

This makes sense but still doesn't explain what I'm seeing in my cache
stats. When I issue a request with rows=10 the stats show an insert
into the queryResultCache. If I send the same query, this time with
rows=1000, I would not expect to see a cache hit but I do. So it seems
like there must be something useful in whatever gets cached on the
first request for rows=10 for it to be re-used by the request for
rows=1000.

--jay


Custom Sorting in Solr

2010-10-29 Thread Ezequiel Calderara
Hi all guys!
I'm in a weird situation here.
We have index a set of documents which are ordered using a linked list (each
documents has the reference of the previous and the next).

Is there a way when sorting in the solr search, Use the linked list to sort?


If that is not possible, how can i use the DIH to access a Service in WCF or
a Webservice? Should i develop my own DIH?


-- 
__
Ezequiel.

Http://www.ironicnet.com


RE: Custom Sorting in Solr

2010-10-29 Thread Jonathan Rochkind
There's no way I know of to make Solr use that kind of data to create the sort 
order you want. 

Generally for 'custom' sorts, you want to create a field in your Solr index 
with possibly artificially constructed values that will 'naturally' sort the 
way you want. 

How to do that with a linked list seems kind of tricky, before you index you 
may have to write code to analyze your whole graph order and then just supply 
sort order keys.  And then if you sometimes update just a few documents, but 
not your whole thing.. Geez, i'm not really sure. It's kind of a tricky 
problem.  That kind of data is not really the expected use case for Solr 
sorting. 

Sorry, I'm not sure what this means or how it would help: use the DIH to 
access a Service in WCF or a Webservice?  Maybe someone else will know exactly 
what you mean. Or maybe if you rephrase with more specificity as to how you 
think this will help you solve your problem, it will be more clear. 

Recall that you don't need to use DIH to index at all, it's just one of several 
methods, it simplifies things for common patterns, it's possible you fall out 
of the common pattern nad it would be simpler not to use DIH.   Although even 
without DIH, I can't think of a particularly simple way to solve your problem. 

Just curious, but is your _entire_ corpus, your entire document set, part of a 
_single_ linked list?  Or do you have several different linked lists in there? 
If several, what do you want to happen with sort if two documents in the result 
set aren't even part of the same linked list?   This kind of thing is one 
reason translating the sort of data you have to a solr sort order starts to 
seem kind of confusing to me. 


From: Ezequiel Calderara [ezech...@gmail.com]
Sent: Friday, October 29, 2010 3:39 PM
To: Solr Mailing List
Subject: Custom Sorting in Solr

Hi all guys!
I'm in a weird situation here.
We have index a set of documents which are ordered using a linked list (each
documents has the reference of the previous and the next).

Is there a way when sorting in the solr search, Use the linked list to sort?


If that is not possible, how can i use the DIH to access a Service in WCF or
a Webservice? Should i develop my own DIH?


--
__
Ezequiel.

Http://www.ironicnet.com


Re: documentCache clarification

2010-10-29 Thread Chris Hostetter

: This is a limitation in the SolrCache API.
: The key into the cache does not contain rows, so the cache returns the
: first 10 docs and increments it's hit count.  Then the cache user
: (SolrIndexSearcher) looks at the entry and determines it can't use it.

Wow, I never realized that.

Why don't we just include the start  rows (modulo the window size) in 
the cache key?

-Hoss


Re: Custom Sorting in Solr

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 3:39 PM, Ezequiel Calderara ezech...@gmail.com wrote:
 Hi all guys!
 I'm in a weird situation here.
 We have index a set of documents which are ordered using a linked list (each
 documents has the reference of the previous and the next).

 Is there a way when sorting in the solr search, Use the linked list to sort?

It seems like you should be able to encode this linked list as an
integer instead, and sort by that?
If there are multiple linked lists in the index, it seems like you
could even use the high bits of the int to designate which list the
doc belongs to, and the low order bits as the order in that list.

-Yonik
http://www.lucidimagination.com


Re: documentCache clarification

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 3:49 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : This is a limitation in the SolrCache API.
 : The key into the cache does not contain rows, so the cache returns the
 : first 10 docs and increments it's hit count.  Then the cache user
 : (SolrIndexSearcher) looks at the entry and determines it can't use it.

 Wow, I never realized that.

 Why don't we just include the start  rows (modulo the window size) in
 the cache key?

The implementation of equals() would be rather difficult... actually
impossible w/o abusing the semantics.
It would also be impossible w/o the Map implementation guaranteeing
what object was on the LHS vs the RHS when equals was called.

Unless I'm missing something obvious?

-Yonik
http://www.lucidimagination.com


Re: documentCache clarification

2010-10-29 Thread Chris Hostetter

:  Why don't we just include the start  rows (modulo the window size) in
:  the cache key?
: 
: The implementation of equals() would be rather difficult... actually
: impossible w/o abusing the semantics.
: It would also be impossible w/o the Map implementation guaranteeing
: what object was on the LHS vs the RHS when equals was called.
: 
: Unless I'm missing something obvious?

You've totally confused me.

What i'm saying is that SolrIndexSearcher should consult the window size 
before consulting the cache -- the start param should be rounded down to 
the nearest multiple of hte window size, and start+rows (ie: end) should 
be rounded up to one less then the nearest multiple of the windows size, 
and then that should be looked up in the cache.

equality on the cache key is straight forward...
   this.q==that.q  this.start==that.start  this.end==that.end  
   this.sort == that.sort  this.filters == that.filters

so if the window size is 50 and SOlrIndexSearcher gets a request like 
q=xstart=33rows=10sort=yfq=... it should  
generate a cache key where start=0 and end=49.  (if start=33rows=42, then 
the key would contain start=0 and end=99 ... which could result in some 
overlap, but that's why people are suppose to pick a window size greater 
then the largest number of rows typically requested)



-Hoss


Re: documentCache clarification

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 4:21 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 :  Why don't we just include the start  rows (modulo the window size) in
 :  the cache key?
 :
 : The implementation of equals() would be rather difficult... actually
 : impossible w/o abusing the semantics.
 : It would also be impossible w/o the Map implementation guaranteeing
 : what object was on the LHS vs the RHS when equals was called.
 :
 : Unless I'm missing something obvious?

 You've totally confused me.

 What i'm saying is that SolrIndexSearcher should consult the window size
 before consulting the cache -- the start param should be rounded down to
 the nearest multiple of hte window size, and start+rows (ie: end) should
 be rounded up to one less then the nearest multiple of the windows size,
 and then that should be looked up in the cache.

That's already done.
In example, do
q=*:*rows=12
q=*:*rows=16
and you should see a queryResultCache hit since queryResultWindowSize
is 20 and both requests round up to that.

*but* if you do this (with an index with more than 20  docs in it)
q=*:*rows=25

Currently that query will round up to 40, but since nResults
(start+row) isn't in the key, it will still get a cache hit but then
not be usable.

Now, if your proposal is to put nResults into the key, we then have a
worse problem.
Assume we're starting over with a clean cache.
q=*:*rows=25   // cached under a key including nResults=40
q=*:*rows=15  // looked up under a key including nResults=20... not found!

 but that's why people are suppose to pick a window size greater
 then the largest number of rows typically requested)

Hmmm, I don't think so.  If that were the case, there would be no need
for two parameters (no need for queryResultWindowSize) since we would
always just pick queryResultMaxDocsCached.

-Yonik
http://www.lucidimagination.com


SolrCore.getSearcher() and postCommit()

2010-10-29 Thread Grant Ingersoll
Is it OK to call and increment a Searcher ref (i.e. SolrCore.getSearcher()) in 
a SolrEventListener.postCommit() hook as long as I decrement it when I am done? 
 I need to get a handle on an IndexReader so I can dump out a portion of the 
index to an external process.

Thanks,
Grant

Re: How can I disable fsync()?

2010-10-29 Thread Grant Ingersoll

On Oct 29, 2010, at 2:11 PM, Igor Chudov wrote:

 Thanks to all and I made Solr work very well on one newer machine.
 
 Now I am setting up Solr on an older server with an IDE hard drive.
 
 Unfortunately, populating the index takes FOREVER due to
 Solr/Lucene/Tomcat calling fsync() a lot after every write.
 
 I would like to know how to disable fsync.
 
 I am very aware of the risks of not having fsync() and I DO NOT CARE
 ABOUT THEM AND DO NOT WANT TO BE REMINDED.
 
 I just want to know how can I disable fsync() when adding to Solr index.

Have a look at FSDirectory.fsync().  That's at least a starting point. YMMV.




Re: SolrCore.getSearcher() and postCommit()

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 5:36 PM, Grant Ingersoll gsing...@apache.org wrote:
 Is it OK to call and increment a Searcher ref (i.e. SolrCore.getSearcher()) 
 in a SolrEventListener.postCommit() hook as long as I decrement it when I am 
 done?  I need to get a handle on an IndexReader so I can dump out a portion 
 of the index to an external process.

Yes, just be aware that the searcher you will get will not contain the
recently committed documents.
If you want that, look at the newSearcher hook instead.

-Yonik
http://www.lucidimagination.com


Re: NOT keyword - doesn't work with dismax?

2010-10-29 Thread Scott K
I couldn't even get the bq= to work with negated queries, although
with edismax, negated queries work with just q=-term

Works:
/solr/select?qt=edismaxq=-red

Here is the failed attempt with dismax
/solr/select?qt=dismaxrows=1indent=trueq=-redbq=*:*^0.001echoParams=alldebugQuery=true

{
  responseHeader:{
status:0,
QTime:20,
params:{
  mm:2-1 5-2 690%,
  pf:title^10.0 sbody^2.0,
  echoParams:all,
  tie:0.01,
  qf:title^10.0 sbody^2.0 tags^1.0 text^1.0,
  q.alt:*:*,
  hl.fl:body,
  wt:json,
  ps:100,
  defType:dismax,
  bq:*:*^0.001,
  echoParams:all,
  debugQuery:true,
  indent:true,
  q:-red,
  qt:dismax,
  rows:1}},
  response:{numFound:0,start:0,docs:[]
  },
  debug:{
rawquerystring:-red,
querystring:-red,
parsedquery:+(-DisjunctionMaxQuery((tags:red | text:red |
title:red^10.0 | sbody:red^2.0)~0.01))
DisjunctionMaxQuery((title:red^10.0 | sbody:red^2.0)~0.01)
MatchAllDocsQuery(*:*^0.0010),
parsedquery_toString:+(-(tags:red | text:red | title:red^10.0 |
sbody:red^2.0)~0.01) (title:red^10.0 | sbody:red^2.0)~0.01
*:*^0.0010,
explain:{},
QParser:DisMaxQParser,
altquerystring:null,
boost_queries:[*:*^0.001],
parsed_boost_queries:[MatchAllDocsQuery(*:*^0.0010)],
boostfuncs:null,
timing:{
  time:20.0,
  prepare:{
time:19.0,
org.apache.solr.handler.component.QueryComponent:{
  time:19.0},
org.apache.solr.handler.component.FacetComponent:{
  time:0.0},
org.apache.solr.handler.component.MoreLikeThisComponent:{
  time:0.0},
org.apache.solr.handler.component.HighlightComponent:{
  time:0.0},
org.apache.solr.handler.component.StatsComponent:{
  time:0.0},
org.apache.solr.handler.component.DebugComponent:{
  time:0.0}},
  process:{
time:1.0,
org.apache.solr.handler.component.QueryComponent:{
  time:0.0},
org.apache.solr.handler.component.FacetComponent:{
  time:0.0},
org.apache.solr.handler.component.MoreLikeThisComponent:{
  time:0.0},
org.apache.solr.handler.component.HighlightComponent:{
  time:0.0},
org.apache.solr.handler.component.StatsComponent:{
  time:0.0},
org.apache.solr.handler.component.DebugComponent:{
  time:1.0}


On Wed, Apr 28, 2010 at 23:35, Chris Hostetter hossman_luc...@fucit.org wrote:

 : Ah, dismax doesn't support top-level NOT query.

 Hmm, yeah i don' think support for purely negated queries was ever added
 to dismax.

 I'm pretty sure that as a workarround you can use add
 something like...
        bq=*:*^0.001
 ...to your query.  based on the dismax structure, that should allow purely
 negative queries to work.



 -Hoss




Solr + Zookeeper Integration

2010-10-29 Thread Claudio Devecchi
Hi people,

I'm trying to configure a little solr cluster but I need to shard the
documents.

I configured my solr with core0 (/opt/solr/core0) and installer the
zookeeper (/opt/zookeeper).

1. On my solrconfig.xml I added the lines below:

zookeeper
str name=zkhostPortshost1:2181/str
str name=mehttp://host1:8983/solr/core0/str
str name=timeout5000/str
str name=nodesDir/solr_domain/nodes/str
 /zookeeper


2. On my /opt/zookeeper/conf/zoo.cfg I configured this way:

tickTime=2000
dataDir=/var/zookeeper
clientPort=2181

And start it with zkServer.sh


After start the zookeeper my dir /solr_domain/nodes continues empty,
following the documentations I didn't find some extra thing to do, but
nothing is working.

SOmebody could tell me what is missing or wrong please?


Thanks


Would it be nuts to store a bunch of large attachments (images, videos) in stored but-not-indexed fields

2010-10-29 Thread Ron Mayer
I have some documents with a bunch of attachments (images, thumbnails
for them, audio clips, word docs, etc); and am currently dealing with
them by just putting a path on a filesystem to them in solr; and then
jumping through hoops of keeping them in sync with solr.

Would it be nuts to stick the image data itself in solr?

More specifically - if I have a bunch of large stored fields,
would it significantly impact search performance in the
cases when those fields aren't fetched.

Searches are very common in this system, and it's very rare
that someone actually opens up one of these attachments
so I'm not really worried about the time it takes to fetch
them when someone does actually want one.



Re: replication not working between 1.4.1 and 3.1-dev

2010-10-29 Thread Shawn Heisey

On 10/27/2010 8:34 PM, Shawn Heisey wrote:
I started to upgrade my slave servers from 1.4.1 to 3.1-dev checked 
out this morning.  Because of SOLR-2034 (new javabin version) the 
replication fails.


Asking about it in comments on SOLR-2034 brought up the suggestion of 
switching to XML instead of javabin, but so far I have not been able 
to figure out how to do this.  I filed a new Jira (SOLR-2204) on the 
replication failure.


Is there any way (through either a config change or minor code 
changes) to make the replication handler use XML?  If I have to make 
small edits to the 1.4.1 source as well as 3.1, that would be OK.


Talking to yourself is probably a sign of mental instability, but I'm 
doing it anyway.  There's been deafening silence from everyone else!


The recommended method of safely upgrading Solr that I've read about is 
to upgrade slave servers, keeping your production application pointed 
either at another set of slave servers or your master servers.  Then you 
test it with a dev copy of your application, and once you're sure it's 
working, you can switch production traffic over to the upgraded set.  If 
it falls over, you just switch back to the old version.  Once you're 
sure it's TRULY working, you upgrade everything else.  To convert fully 
to the new index format, you have the option of reindexing or optimizing 
your existing indexes.


I like this method, and this is the way I want to do it, except that the 
new javabin format makes it impossible.  I need a viable way to 
replicate indexes from a set of 1.4.1 master servers to 3.1-dev slaves.  
Delving into the source and tackling the problem myself is something I 
would truly love to do, but I lack the necessary skills.


I believe this will be a showstopper problem if 3.1 is released in its 
current state.


Are there any clever workarounds that would let me proceed with my 
upgrade now?


Thanks,
Shawn



Re: Looking for Developers

2010-10-29 Thread Dennis Gearon
LOL!

We ARE programmers, and we do like absolutes :-)
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Fri, 10/29/10, Lance Norskog goks...@gmail.com wrote:

 From: Lance Norskog goks...@gmail.com
 Subject: Re: Looking for Developers
 To: solr-user@lucene.apache.org, t...@statsbiblioteket.dk
 Date: Friday, October 29, 2010, 3:14 AM
 Then, Godwin!
 
 On Fri, Oct 29, 2010 at 3:04 AM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:
  On Fri, 2010-10-29 at 10:06 +0200, Mark Allan wrote:
  For me, I simply deleted the original email, but
 I'm now quite
  enjoying the irony of the complaints causing more
 noise on the list
  than the original email!  ;-)
 
  He he. An old classic. Next in line is the
 meta-meta-discussion about
  whether meta-discussions belong on the list or if they
 should be moved
  to solr-user-meta. Repeat ad nauseam.
 
  Job-postings are on-topic IMHO and unless their volume
 grows
  significantly, I see no reason to create a new mail
 lists.
 
 
 
 
 
 -- 
 Lance Norskog
 goks...@gmail.com



Re: Would it be nuts to store a bunch of large attachments (images, videos) in stored but-not-indexed fields

2010-10-29 Thread Shashi Kant
On Fri, Oct 29, 2010 at 6:00 PM, Ron Mayer r...@0ape.com wrote:

 I have some documents with a bunch of attachments (images, thumbnails
 for them, audio clips, word docs, etc); and am currently dealing with
 them by just putting a path on a filesystem to them in solr; and then
 jumping through hoops of keeping them in sync with solr.



Not sure why that is an issue. Keeping them in sync with solr would be the
same as storing within a file-system. Why would storing within solr be any
different.


 Would it be nuts to stick the image data itself in solr?

 More specifically - if I have a bunch of large stored fields,
 would it significantly impact search performance in the
 cases when those fields aren't fetched.


Hard to say. Assume you mean storing by converting into a base64 format. If
you do not retrieve the field when fetching, AFAIK should not affect it
significantly, if at all.
So if you manage your retrieval should be fine.


 Searches are very common in this system, and it's very rare
 that someone actually opens up one of these attachments
 so I'm not really worried about the time it takes to fetch
 them when someone does actually want one.