Re: Can Master push data to slave

2011-08-15 Thread Pawan Darira
Regarding point b, i mean that when Slave server does a replication from
Master, it creates a lock-file in it's index directory. How to avoid that?


On Tue, Aug 9, 2011 at 2:56 AM, Markus Jelsma markus.jel...@openindex.iowrote:

 Hi,

  Hi
 
  I am using Solr 1.4. and doing a replication process where my slave is
  pulling data from Master. I have 2 questions
 
  a. Can Master push data to slave

 Not in current versions. Not sure about exotic patches for this.

  b. How to make sure that lock file is not created while replication

 What do you mean?

 
  Please help
 
  thanks
  Pawan



filtering non english text from my results

2011-08-15 Thread Omri Cohen
Hi All,

I am looking for a solution to filter out text which contains non english
words. Where my goal is to present my english speaking users with results in
their language.

any ideas?

thanks
Omri


Re: sorting issue with solr 3.3

2011-08-15 Thread Bernd Fehling

I have created an issue with test attached.

https://issues.apache.org/jira/browse/SOLR-2713

Will try to figure out whats going wrong.

Regards
Bernd
http://www.base-search.net/


Am 13.08.2011 16:20, schrieb Bernd Fehling:

The issue was located in a 31 million docs index and i have already reduced it
to a reproducable 4 documents index. It is stock solr 3.3.0.
Yes, the documents are also in the wrong order as the field sort values.
Just added only the field sort values to the email to keep it short.
I will produce a test on Monday when I'm back in my office.
Hang on...

Regards
Bernd
http://www.base-search.net/


I've checked in an improved TestSort that adds deleted docs and
randomizes things a lot more (and fixes the previous reliance on doc
ids not being reordered).
I still can't reproduce this error though.
Is this stock solr?  Can you verify that the documents are
in the
wrong order also (and not just the field sort values)?

-Yonik
http://www.lucidimagination.com


Re: Unbuffered entity enclosing request can not be repeated Invalid chunk header

2011-08-15 Thread Markus Jelsma
Hi,

 Hi Markus,
 
 thanks for your answer.
 I'm using Solr. 4.0 and jetty now and observe the behavior and my error
 logs next week.
 tomcat can be a reason, we will see, i'll report.
 
 I'm indexing WITHOUT batches, one doc after another. But i would try out
 the batch indexing as well as
 retry indexing faulty docs.
 if you indexing one batch, and one doc in batch is corrupt, what happens
 with another 249docs(total 250/batch)? Are they indexed and
 updated when you retry to indexing the batch, or fails the complete batch?

The entire batch should fail but i cannot confirm. Usually all fail if there 
is an error somewhere such as an XML error.

 
 Regards
 Vadim
 
 
 
 
 2011/8/11 Markus Jelsma markus.jel...@openindex.io
 
  Hi,
  
  We  see these errors too once on a while but there is real answer on the
  mailing list here except one user suspecting Tomcat is responsible
  (connection
  time outs).
  
  Another user proposed to limit the number of documents per batch but
  that, of
  course, increases the number of connections made. We do only 250
  docs/batch to
  limit RAM usage on the client and start to see these errors very
  occasionally.
  There may be a coincidence.. or not.
  
  Anyway, it's really hard to reproduce if not impossible. It happens when
  connecting directly as well when connecting through a proxy.
  
  What you can do is simply retry the batch and it usually works out fine.
  At least you don't loose a batch in the process. We retry all failures
  at least a
  couple of times before giving up an indexing job.
  
  Cheers,
  
   Hello folks,
   
   i use solr 1.4.1 and every 2 to 6 hours i have indexing errors in my
   log files.
   
   on the client side:
   2011-08-04 12:01:18,966 ERROR [Worker-242] IndexServiceImpl - Indexing
   failed with SolrServerException.
   Details: org.apache.commons.httpclient.ProtocolException: Unbuffered
  
  entity
  
   enclosing request can not be repeated.:
  
   Stacktrace:
  org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
  t
  
   pSolrServer.java:469) .
   .
   on the server side:
   INFO: [] webapp=/solr path=/update params={wt=javabinversion=1}
   status=0 QTime=3
   04.08.2011 12:01:18 org.apache.solr.update.processor.LogUpdateProcessor
   finish
   INFO: {} 0 0
   04.08.2011 12:01:18 org.apache.solr.common.SolrException log
   SCHWERWIEGEND: org.apache.solr.common.SolrException:
   java.io.IOException: Invalid chunk header
   .
   .
   .
   i`m indexing ONE document per call, 15-20 documents per second, 24/7.
   what may be the problem?
   
   best regards
   vadim


Re: Nutch related issue: URL Ignore

2011-08-15 Thread Markus Jelsma
The Solr list is not the appropriate list to ask. Please try the Nutch user 
mailing list.

 hi
 
 i am using nutch 1.2. in my crawl-urlfilter.txt, i am specifying URLs to be
 skipped. i am giving some patterns that need to be skipped but it is not
 working
 
 e.g.
 
 -^http://([a-z0-9]*\.)*domain.com
 +^http://([a-z0-9]*\.)*domain.com/([0-9-a-z])*.html
 -^http://([a-z0-9]*\.)*domain.com/([a-z/])*
 -^http://([a-z0-9]*\.)*domain.com/top-ads.php
 
 i want the second URL only to be included while crawling  all other
 patterns to be excluded. but it is crawling all of them. Please suggest
 where might be the issue
 
 thanks
 Pawan


Migration from Autonomy IDOL to SOLR

2011-08-15 Thread Arcadius Ahouansou
Hello.

We have a couple of application running on half a dozen Autonomy IDOL
servers.
Currently, all feature we need are supported by Solr.

We have done some internal testing and realized that SOLR would do a better
job.

So, we are investigation all possibilities for a smooth migration from IDOL
to SOLR.

I am looking for advice from people who went through something similar.

Ideally, we would like to keep most of our legacy code unchanged and have a
kind of query-translation-layer plugged into our app if possible.

-Is there lib available?

-Any thought?

Thanks.

Arcadius.


Re: strip html from data

2011-08-15 Thread Merlin Morgenstern
2011/8/11 Ahmet Arslan iori...@yahoo.com

  Is there a way to strip the html tags completly and not
  index them? If not,
  how to I retrieve the results without html tags?

 How do you push documents to solr? You need to strip html tags before the
 analysis chain. For example, if you are using Data Import Handler, you can
 use HTMLStripTransformer.

  http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer


Thank you everybody for your help and all the detailed explanations. This
solution fixed the problem.

Best regards.


Invalid Date String for highlighting any date field match

2011-08-15 Thread baronDodd
I must be missing something..

It appears to me with solr 3.2 and 3.3 if you highlight on a date field (e.g
by searching on *:*) the application blows up with:

ERROR org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException:
Invalid Date String:'1306406051000'
at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
at
org.apache.solr.analysis.TrieTokenizer.reset(TrieTokenizerFactory.java:106)
at
org.apache.solr.analysis.TrieTokenizer.init(TrieTokenizerFactory.java:76)
at
org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:51)
at
org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:41)
at
org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:68)
at
org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:75)
at
org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:385)
at
org.apache.solr.highlight.DefaultSolrHighlighter.createAnalyzerTStream(DefaultSolrHighlighter.java:550)

I am using solrj beans to save Date objects to a schema type of 'date' or
'tdate' - makes no difference.

From what I can see this code will never work as the DefaultSolrHighlighter
passes the date as a millisecond string all the way down to the
TrieTokenizer which calls DateField.parseMath() and this immediately rejects
anything which is not formatted as a datestring.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Invalid-Date-String-for-highlighting-any-date-field-match-tp3255469p3255469.html
Sent from the Solr - User mailing list archive at Nabble.com.


A strange Exception in Solr 1.4

2011-08-15 Thread weiwei fu
java.lang.NullPointerException



HI.
I meet a NullPointerException in Solr 1.4 . The params
is  params={q=s_id:112511+AND+b_id:332133defType=lucene} status=500
QTime=1}

   2011-08-15 10:31:24,968 ERROR [org.apache.solr.core.SolrCore] -
java.lang.NullPointerException
at sun.nio.ch.Util.free(Util.java:199)
at sun.nio.ch.Util.offerFirstTemporaryDirectBuffer(Util.java:176)
at sun.nio.ch.IOUtil.read(IOUtil.java:181)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:612)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136)
at
org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:247)
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80)
at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64)
at
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:129)
at
org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:160)
at
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232)
at
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179)
at
org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:975)
at
org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:627)
at
org.apache.lucene.index.FilterIndexReader.docFreq(FilterIndexReader.java:194)
at org.apache.lucene.index.MultiReader.docFreq(MultiReader.java:344)
at
org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:308)
at
org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:147)
at
org.apache.lucene.search.Similarity.idfExplain(Similarity.java:765)
at
org.apache.lucene.search.TermQuery$TermWeight.init(TermQuery.java:46)
at
org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:146)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:184)
at
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:415)
at org.apache.lucene.search.Query.weight(Query.java:99)
at org.apache.lucene.search.Searcher.createWeight(Searcher.java:230)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at
com.taobao.terminator.core.realtime.DefaultSearchService.query(DefaultSearchService.java:197)
at sun.reflect.GeneratedMethodAccessor73.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest0(ProviderProcessor.java:222)
at
com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest(ProviderProcessor.java:174)
at
com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest(ProviderProcessor.java:41)
at
com.taobao.remoting.impl.DefaultMsgListener$1ProcessorExecuteTask.run(DefaultMsgListener.java:131)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)



  Thank u





allen.Fu


RE: filtering non english text from my results

2011-08-15 Thread Jaeger, Jay - DOT
1.  Find a dictionary with the English words you find acceptable
2.  Use the KeepWordFilterFactory  (doc in the AnalyzerTTokenizersTokenFilters 
Wiki page).

-Original Message-
From: Omri Cohen [mailto:omri...@gmail.com] 
Sent: Monday, August 15, 2011 1:23 AM
To: solr-user@lucene.apache.org
Subject: filtering non english text from my results

Hi All,

I am looking for a solution to filter out text which contains non english
words. Where my goal is to present my english speaking users with results in
their language.

any ideas?

thanks
Omri


Re: parsing many documents takes too long

2011-08-15 Thread Erik Hatcher
Sounds like you aren't using SolrJ, which will return a Java object back to you 
natively.   Give that a try and let us know how it fairs against the jaxb 
method.

Erik

On Aug 12, 2011, at 02:58 , Tri Nguyen wrote:

 Hi,
  
 My results from solr returns about 982 documents and I use jaxb to parse them 
 into java objects, which takes about 469 ms, which is over my 150-200ms 
 threshold.
  
 Is there a solution around this?  Can I store the java objects in the index 
 and return them in the solr response and then serialize them back into java 
 objects?  Would this take less time?
  
 Any other ideas?
  
 Thanks,
  
 Tri



RE: ideas for indexing large amount of pdf docs

2011-08-15 Thread Jaeger, Jay - DOT
Note on i:  Solr replication provides pretty good clustering support 
out-of-the-box, including replication of multiple cores.  Read the Wiki on 
replication (Google +solr +replication if you don't know where it is).  

In my experience, the problem with indexing PDFs is it takes a lot of CPU on 
the document parsing side (client), not on the Solr server side.  So make sure 
you do that part on the client and not the server.

Avoiding iii:


Suggest that you write yourself a multi-threaded performance test so that you 
aren't guessing what your performance will be.

We wrote one in Perl.  It handles an individual thread (we were testing 
inquiry), and we wrote a little batch file / shell script to start up the 
desired number of threads.

The main statement in our batch file (the rest just set the variables).  A  
Shell script would be even easier.

for /L %%i in (1,1,%THREADS%) DO start /B perl solrtest.pl -h %SOLRHOST% 
-c %COUNT% -u %1 -p %2 -r %SOLRREALM% -f %SOLRLOC%\firstsynonyms.txt -l 
%SOLRLOC%\lastsynonyms.txt -z %FUZZ%

The perl


#!/usr/bin/perl

#
#   Perl program to run a thread of solr testing
#

use Getopt::Std;# For options processing
use POSIX;  # For time formatting
use XML::Simple;# For processing of XML config file
use Data::Dumper;   # For debugging XML config file
use HTTP::Request::Common;  # For HTTP request to Solr
use HTTP::Response;
use LWP::UserAgent; # For HTTP request to Solr

$host = YOURHOST:8983;
$realm = YOUR AUTHENTICATION REALM;
$firstlist = firstsynonyms.txt;
$lastlist = lastsynonyms.txt;
$fuzzy = ;

$me = $0;

sub usage() {
print perl $me -c iterations [-d] [-h host:port ] [-u user [-p 
password]] \n;
print \t\t[-f firstnamefile] [-l lastnamefile] [-z fuzzy] [-r 
realm]\n;
exit(8);
}


#
#   Process the command line options, and open the output file.
#

getopts('dc:u:p:f:l:h:r:z:') || usage();

if(!$opt_c) {
usage();
}

$count = $opt_c;

if($opt_u) {
$user = $opt_u;
}

if($opt_p) {
$password = $opt_p;
}

if($opt_h) {
$host = $opt_h;
}

if($opt_f) {
$firstlist = $opt_f;
}

if($opt_l) {
$lastlist = $opt_l;
}

if($opt_r) {
$realm = $opt_r;
}

if($opt_z) {
$fuzzy = ~ . $opt_z;
}

$debug = $opt_d;


#
#   If the host string does not include a :, add :80
#

if($host !~ /:/) {
$host = $host . :80;
}

#
#   Read the lists of first and last names
#

open(SYNFILE,$firstlist) || die Can't open first name list $firstlist\n;
while(SYNFILE) {
@newwords = split /,/;
for($i=0; $i = $#newwords; ++$i) {
$newwords[$i] =~ s/^\s+//;
$newwords[$i] =~ s/\s+$//;
$newwords[$i] = lc($newwords[$i]);
}
push @firstnames, @newwords;
}
close(SYNFILE);

open(SYNFILE,$lastlist) || die Can't open last name list $lastlist\n;
while(SYNFILE) {
@newwords = split /,/;
for($i=0; $i = $#newwords; ++$i) {
$newwords[$i] =~ s/^\s+//;
$newwords[$i] =~ s/\s+$//;
$newwords[$i] = lc($newwords[$i]);
}
push @lastnames, @newwords;
}
close(SYNFILE);


print $#firstnames First Names, $#lastnames Last Names\n;
print User: $user\n;


my $userAgent = LWP::UserAgent-new(agent = 'solrtest.pl');
$userAgent-credentials($host,$realm,$user,$password);

$uri = http://$host/solr/select;;

$starttime = time();

for($c=0; $c  $count; ++$c) {
$fname = $firstnames[rand $#firstnames];
$lname = $lastnames[rand $#lastnames];
$response = $userAgent-request(
POST $uri,
[ 
q = lnamesyn:$lname AND fnamesyn:$fname$fuzzy,
rows = 25
]);

if($debug) {
print Query: lnamesyn:$lname AND fnamesyn:$fname$fuzzy;
print $response-content();
}
print POST for $fname $lname completed, HTTP status= . 
$response-code . \n;
}

$elapsed = time() - $starttime;
$average = $elapsed / $count;

print Time: $elapsed s ($average/request)\n;


-Original Message-
From: Rode Gonzalez (libnova) [mailto:r...@libnova.es] 
Sent: Saturday, August 13, 2011 3:50 AM
To: solr-user@lucene.apache.org
Subject: ideas for indexing large amount of pdf docs

Hi all,

I want to ask about the best way to implement a solution for indexing a 
large amount of pdf documents between 10-60 MB each one. 100 to 1000 users 
connected simultaneously.

I actually have 1 core of solr 3.3.0 and it works fine for a few number of 
pdf docs but I'm afraid about the moment when we enter in production time.

some possibilities:

i. clustering. I have no experience in this, so it will be a bad idea to 
venture into this.

ii. multicore solution. make some kind of hash to choose one core at each 
query (exact queries) and thus reduce 

Re: Exception DirectSolrSpellChecker when using spellcheck.q

2011-08-15 Thread Robert Muir
what subversion revision are you using? I think you just need to svn
up, as from the line number I can tell its before I fixed this bug in
trunk :)

On Fri, Aug 12, 2011 at 11:36 AM, O. Klein kl...@octoweb.nl wrote:
 Spellchecker works fine, but when using spellcheck.q it gives following
 exception (queryAnalyzerFieldType is defined if that would matter).

 Is it bug or am I doing something wrong?

 2011-08-12 17:30:54,368 java.lang.NullPointerException
        at
 org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476)
        at
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131)
        at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:202)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1401)
        at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
        at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
        at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
        at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
        at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
        at java.lang.Thread.run(Thread.java:619)


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Exception-DirectSolrSpellChecker-when-using-spellcheck-q-tp3249565p3249565.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
lucidimagination.com


Solr + Arabic Search

2011-08-15 Thread Rohit
I am trying to search Arabic keyword in solr, but am just unable to do so. I 
have successfully indexed Arabic but the search doesn’t seem to be working, 

 

Search URL:

 

http://localhost:8080/solr/tw/select/?q=%D8%AA%D8%A3%D8%AC%D9%8A%D8%B1%20%D8%A7%D9%84%D8%A7%D9%87%D9%84%D9%8A

 

The response :

response

lst name=responseHeader

int name=status0/int

int name=QTime18/int

lst name=params

str name=qتأجير الاهلي/str

/lst

/lst

result name=response numFound=0 start=0/

/response

 

 

Regards,

Rohit

Mobile: +91-9901768202

About Me:  http://about.me/rohitg http://about.me/rohitg

 



SolrJ and ContentStreams

2011-08-15 Thread Marcus Paradies
Hi 

I'm considering to use SolrJ to run queries in a MLT fashion against my Solr
server. I saw that there is already an open bug filed in Jira
(https://issues.apache.org/jira/browse/SOLR-1085).

My questions is: 

Is it possible to use content streams to pass a data stream to the MLT
handler in SolrJ?

Ideally I'd like to do something like 

http://localhost:8983/solr/mlt?stream.body=electronics%20memorymlt.fl=manu,catmlt.interestingTerms=listmlt.mintf=0

in SolrJ. Currently I'm defining most of the MLT specific parameters in the
solrconfig.xml. Is that possible in SolrJ?

Thanks,
Marcus

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-and-ContentStreams-tp3256237p3256237.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr + Arabic Search

2011-08-15 Thread Ahmet Arslan

 I am trying to search Arabic keyword
 in solr, but am just unable to do so. I have successfully
 indexed Arabic but the search doesn’t seem to be working,

Could it be URI encoding of your servlet container?
http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

Does  'match all docs query' q=*:*defType=lucene returns something?


Minimum score filter

2011-08-15 Thread Donald J. Organ IV
Is there a way to set a minimum score requirement so that matches below a given 
score are not return/included in facet counts. 

RE: Solr + Arabic Search

2011-08-15 Thread Rohit
Thanks Ahmet, this was the problem I guess.

Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: 15 August 2011 22:20
To: solr-user@lucene.apache.org
Subject: Re: Solr + Arabic Search


 I am trying to search Arabic keyword
 in solr, but am just unable to do so. I have successfully
 indexed Arabic but the search doesn’t seem to be working,

Could it be URI encoding of your servlet container?
http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

Does  'match all docs query' q=*:*defType=lucene returns something?



Re: Migration from Autonomy IDOL to SOLR

2011-08-15 Thread Alexei Martchenko
This might be a longshot but... Adobe is deprecating Verity in Coldfusion
engine. Version 9 has both databases but I believe CF10 will only have Solr
bundled. Idol is the-new-verity since autonomy acquired verity. Although
Adobe wraps solr to work like old verity, there might be some info on people
who migrated from verity from solr few years ago.

Sorry for not helping much but sometimes these little information leads to
something.

2011/8/15 Arcadius Ahouansou arcad...@menelic.com

 Hello.

 We have a couple of application running on half a dozen Autonomy IDOL
 servers.
 Currently, all feature we need are supported by Solr.

 We have done some internal testing and realized that SOLR would do a better
 job.

 So, we are investigation all possibilities for a smooth migration from IDOL
 to SOLR.

 I am looking for advice from people who went through something similar.

 Ideally, we would like to keep most of our legacy code unchanged and have a
 kind of query-translation-layer plugged into our app if possible.

 -Is there lib available?

 -Any thought?

 Thanks.

 Arcadius.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Minimum score filter

2011-08-15 Thread simon
The absolute value of a relevance score doesn't have a lot of meaning and
the range of scores can vary a lot depending on any boost you may apply.
Even if you normalize them (say on a 1-100 scale where 100 is the max
relevance) you can't really draw any valid conclusions from those values.

It would help if you described exactly what problem you're trying to solve.

-Simon

On Mon, Aug 15, 2011 at 1:02 PM, Donald J. Organ IV
dor...@donaldorgan.comwrote:

 Is there a way to set a minimum score requirement so that matches below a
 given score are not return/included in facet counts.


Re: Minimum score filter

2011-08-15 Thread Donald J. Organ IV
OK I am doing a search using the following fields name^2.0 code^1.8 
cat_search^1.5 description^0.8

I am searching for:   free range dog nips

I am getting back 2 documents the first is the document I am looking for, and 
contains those works in the name field, as the name field is Free Range Dog 
Nip Chicken Breast Wraps


The second looks like its matching because those words are contained within the 
description.



- Original Message -
From: simon mtnes...@gmail.com
To: solr-user@lucene.apache.org
Sent: Monday, August 15, 2011 1:59:17 PM
Subject: Re: Minimum score filter

The absolute value of a relevance score doesn't have a lot of meaning and
the range of scores can vary a lot depending on any boost you may apply.
Even if you normalize them (say on a 1-100 scale where 100 is the max
relevance) you can't really draw any valid conclusions from those values.

It would help if you described exactly what problem you're trying to solve.

-Simon

On Mon, Aug 15, 2011 at 1:02 PM, Donald J. Organ IV


Re: Tomcat7 with Solr closes at fixed hours, every time another hour

2011-08-15 Thread Chris Hostetter
: 
: I'm having a Solr running within Tomcat7 and Tomcat is closing at
: fixed hours, everytime is a different hour. catalina.log doesn't show
: anything other than a clean tomcat shutdown  (no exception or
: anything). I would really apreciate some advice on how to debug this.
: Tomcat doesn't run anything other than solr.

this doesn't appear to be related to Solr.  You can see from your logs 
that the command originats from outside of solr -- I suspect you would see 
the same problem if you ran a tomcat instance on this port w/o using solr 
at all.

My guess is you have a rogue cron command either running on the local 
machine or using the remote shutdown port  telling tomcat to shutdown.  
(perhaps it's looking for tomcat ports whose logs suggest they aren't 
getting a lot of traffic?  or aren't registered with a load balancer?)

You might want to start by making sure you have remote shutdown support 
disabled...
https://tomcat.apache.org/tomcat-7.0-doc/security-howto.html#Server

...and checking the crontab on the local machine to see what runs on the 
hour.



-Hoss


Re: Why is boost not always listed in explain when debug is on?

2011-08-15 Thread Chris Hostetter

: using Solr Specification Version: 4.0.0.2011.08.09.11.02.13
: 
: While trying understand scoring I noticed that boost is intermittently
: displayed in the explain. For example, using edismax and the query string is

Hmmm... that output is strange.  it's not just the boost that's missing, 
all of the details about the queryWeight part of the score 
from the name:starbucks clause are missing (and only the fieldWeight) 
is listed...

:   8.609147 = (MATCH) weight(name:starbucks^20.0 in 163) [DefaultSimilarity],
: result of:
: 8.609147 = fieldWeight in 163, product of:
:   1.0 = tf(freq=1.0), with freq of:
: 1.0 = termFreq=1
:   8.609147 = idf(docFreq=8644, maxDocs=17433139)
:   1.0 = fieldNorm(doc=163)

...i honestly have no idea what would cause that ... my best guess is that 
maybe with the boost thta high the queryWeight winds up being 1.0 and 
the score Explanation code leaves it out since it doesn't affect things?


-Hoss


Re: Migration from Autonomy IDOL to SOLR

2011-08-15 Thread Arcadius Ahouansou
Hi Alexei.
I had a quick look and it seems that Adobe provides their CF tag as a
wrapper around the verity/solr API, therefore, the application code is not
poluated with client specific API.
This makes app migration easier.

Thanks for the input.

Arcadius.


On Mon, Aug 15, 2011 at 6:46 PM, Alexei Martchenko 
ale...@superdownloads.com.br wrote:

 This might be a longshot but... Adobe is deprecating Verity in Coldfusion
 engine. Version 9 has both databases but I believe CF10 will only have Solr
 bundled. Idol is the-new-verity since autonomy acquired verity. Although
 Adobe wraps solr to work like old verity, there might be some info on
 people
 who migrated from verity from solr few years ago.

 Sorry for not helping much but sometimes these little information leads to
 something.

 2011/8/15 Arcadius Ahouansou arcad...@menelic.com

  Hello.
 
  We have a couple of application running on half a dozen Autonomy IDOL
  servers.
  Currently, all feature we need are supported by Solr.
 
  We have done some internal testing and realized that SOLR would do a
 better
  job.
 
  So, we are investigation all possibilities for a smooth migration from
 IDOL
  to SOLR.
 
  I am looking for advice from people who went through something similar.
 
  Ideally, we would like to keep most of our legacy code unchanged and have
 a
  kind of query-translation-layer plugged into our app if possible.
 
  -Is there lib available?
 
  -Any thought?
 
  Thanks.
 
  Arcadius.
 



 --

 *Alexei Martchenko* | *CEO* | Superdownloads
 ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
 5083.1018/5080.3535/5080.3533



Indexing from a database via SolrJ

2011-08-15 Thread Shawn Heisey
Is there a simple way to get all the fields from a jdbc resultset into a 
bunch of SolrJ documents, which I will then send to be indexed in Solr?  
I would like to avoid the looping required to copy the data one field at 
a time.  Copying it one document at a time would be acceptable, but it 
would be nice if there was a way to copy them all at once.


Another idea that occurred to me is to add the dataimporter jar to my 
project and leverage it to do the heavy lifting, but I will need some 
pointers about what objects and methods to research.  Is that a 
reasonable idea, or is it too integrated into the server code to be used 
with SolrJ?


Can anyone point me in the right direction?

Thanks,
Shawn



Re: defType argument weirdness

2011-08-15 Thread Chris Hostetter

: Huh, I'm still not completely following. I'm sure it makes sense if you
: understand the underlying implemetnation, but I don't understand how 'type'
: and 'defType' don't mean exactly the same thing, just need to be expressed
: differently in different location.
...
: prefixing def to type is not making it very clear what the difference is!
: What's def supposed to stand for anyway?)

def == default.

type and defType both select a QParser, but they select the QParser for 
parsing different levels of sub queries.  

type can only be used as a localparam and it is how you instruct Solr as 
to what QParser you want it to use when parsing *that* specific query 
string.

defType can be used as either a top level param or as a localparam (to 
specify the default value for the type of QParser you want used for 
the main query string at that level.

Here's an example i just used last week in a project (isfdb-solr) that 
shows what i mean...

q={!boost b=sum(views,annualviews) defType=dismax v=$qq}

...that's just syntactic sugar for...

q={!type=boost b=sum(views,annualviews) defType=dismax v=$qq}

The type localparam (of q) says that the q param should be parsed 
using the boost QParser (which is what knows to parse the b param as a 
function and how to use it) regardless of whatever top level defType 
param might be specified.

the defType localparam then says that when parsing the main sub query 
(the qq param in this case) the default value assumed for the type 
local param should be dismax.

so if i have this...

q={!boost b=sum(M,N) defType=dismax v=$qq}qq=XXX

that will result in XXX being parsed using the dismax QParser.

...but if i have this...

q={!boost b=sum(M,N) defType=dismax v=$qq}qq={!type=lucene}XXX

...then the defType localparam is ignored and XXX is parsed using the 
lucene QParser (type overrides defType).

but defType only applies the default for the main query one level down 
... it doesn't recurse forever (and it doesn't apply to secondary query 
string parsing like fq or facet.query or the b function in the boost 
QParser) so if you have something like this...

q={!boost b=sum(M,N) defType=XXX v=$qq}qq={!lucene v=$zz}zz=CCC

that defType=XXX won't be used when parsing CCC (because it's one level 
removed)



-Hoss


Product data schema question

2011-08-15 Thread Steve Cerny
I'm working on an online eCommerce project and am having difficulties
building the core / index schema.  Here is the way we organize our product
information in a normalized database.

A product model has many SKUs (called colorways)
A SKU has many sizes (called variants)
A SKU size has associated inventory (called variant inventory)

When we setup our product core we have the following field information

Doc
* brand
* model name
* SKU
* color name

Sample records are as follows

* Haynes, Undershirt, 1234, white
* Haynes, Undershirt, 1235, grey
* Fruit of the Loom, Undershirt, 1236, white
* Fruit of the Loom, Underwear, 1237, grey

The issue I'm having is I want to add inventory to each size of each SKU for
faceting.  Example,

SKU 1234 has sizes small, medium, large.  Size small has 5 in stock, size
medium 10, and size large 25.

In a normalized data table I would have a separate table just for inventory
and related it back to the SKU with a foreign key.  How do I store size and
inventory information effectively with Solr?

-- 
Steve


hl.useFastVectorHighlighter, fragmentsBuilder and HighlightingParameters

2011-08-15 Thread Alexei Martchenko
I'm having some trouble trying to upgrade my old hightligher
from highlightingfragmenterformatter format (1.4 version, default
config in the solr website) to the new Fast Vector highlighter.

I'm using SOLR 3.3.0 with luceneMatchVersionLUCENE_33/luceneMatchVersion
in config

In my solrconfig.xml i added these lines

in the default request handler:

bool name=hl.useFastVectorHighlightertrue/bool
bool name=hl.usePhraseHighlightertrue/bool
bool name=hl.highlightMultiTermtrue/bool
str name=hl.fragmentsBuildercolored/str

and

fragmentsBuilder name=colored
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
  lst name=defaults
str name=hl.tag.pre![CDATA[
 b style=background:yellow,b style=background:lawgreen,
 b style=background:aquamarine,b style=background:magenta,
 b style=background:palegreen,b style=background:coral,
 b style=background:wheat,b style=background:khaki,
 b style=background:lime,b
style=background:deepskyblue]]/str
str name=hl.tag.post![CDATA[/b]]/str
  /lst
/fragmentsBuilder

All I get is: ('grave' means severe)

15/08/2011 20:44:19 org.apache.solr.common.SolrException log
GRAVE: org.apache.solr.common.SolrException: Unknown fragmentsBuilder:
colored
at
org.apache.solr.highlight.DefaultSolrHighlighter.getSolrFragmentsBuilder(DefaultSolrHighlighter.java:320)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:508)

at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:376)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Docs in http://wiki.apache.org/solr/HighlightingParameters say:

hl.fragmentsBuilder

Specify the name of
SolrFragmentsBuilderhttp://wiki.apache.org/solr/SolrFragmentsBuilder
. [image: !] Solr3.1 http://wiki.apache.org/solr/Solr3.1 This parameter
makes sense for
FastVectorHighlighterhttp://wiki.apache.org/solr/FastVectorHighlighter
 only.

SolrFragmentsBuilder
http://wiki.apache.org/solr/SolrFragmentsBuilder respects
hl.tag.pre/post parameters:

!-- multi-colored tag FragmentsBuilder --
fragmentsBuilder name=colored
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
  lst name=defaults
str name=hl.tag.pre![CDATA[
 b style=background:yellow,b style=background:lawgreen,
 b style=background:aquamarine,b style=background:magenta,
 b style=background:palegreen,b style=background:coral,
 b style=background:wheat,b style=background:khaki,
 b style=background:lime,b style=background:deepskyblue]]/str
str name=hl.tag.post![CDATA[/b]]/str
  /lst
/fragmentsBuilder


-- 

*Alexei*


Re: Indexing from a database via SolrJ

2011-08-15 Thread Arcadius Ahouansou
Hi Shawn.

Unles you are doing complex pre-processing before indexing, you may want to
have a look at:
http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS

That should take care of it without any coding.

You may need to periodically do a HTTP GET to trigger the import.


Arcadius.

On Mon, Aug 15, 2011 at 11:25 PM, Shawn Heisey s...@elyograg.org wrote:

 Is there a simple way to get all the fields from a jdbc resultset into a
 bunch of SolrJ documents, which I will then send to be indexed in Solr?  I
 would like to avoid the looping required to copy the data one field at a
 time.  Copying it one document at a time would be acceptable, but it would
 be nice if there was a way to copy them all at once.

 Another idea that occurred to me is to add the dataimporter jar to my
 project and leverage it to do the heavy lifting, but I will need some
 pointers about what objects and methods to research.  Is that a reasonable
 idea, or is it too integrated into the server code to be used with SolrJ?

 Can anyone point me in the right direction?

 Thanks,
 Shawn




-- 
W: www.menelic.com
---


Score

2011-08-15 Thread Bill Bell
How do I change the score to scale it between 0 and 100 irregardless of the 
score? 

q.alt=*:*bq=lang:SpanishdefType=dismax

Bill Bell
Sent from mobile



Re: Score

2011-08-15 Thread Shashi Kant
https://wiki.apache.org/lucene-java/ScoresAsPercentages



On Mon, Aug 15, 2011 at 8:13 PM, Bill Bell billnb...@gmail.com wrote:

 How do I change the score to scale it between 0 and 100 irregardless of the
 score?

 q.alt=*:*bq=lang:SpanishdefType=dismax

 Bill Bell
 Sent from mobile




Re: Indexing from a database via SolrJ

2011-08-15 Thread Shawn Heisey

On 8/15/2011 5:55 PM, Arcadius Ahouansou wrote:

Hi Shawn.

Unles you are doing complex pre-processing before indexing, you may want to
have a look at:
http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS

That should take care of it without any coding.

You may need to periodically do a HTTP GET to trigger the import.


I'm aware of this, and my current build system written in Perl works 
this way.  When I need to do a full index rebuild, I will still use the 
DIH, but it has become too limiting for regular indexing needs.  It will 
be inadequate for things that we have in development.  We need more 
flexibility, so I am wanting to handle the interface to the DB myself 
and index directly with SolrJ.


Thanks,
Shawn