Re: Phrase between quotes with dismax edismax

2011-11-16 Thread Jean-Claude Dauphin
Thanks Erick for yr quick answer.

I am using Solr 3.1

1) I have set the mm parameter to 0 and removed the categories from the
search. Thus the query is only for chef de projet and nothing else.
But the problem remains, i.e searching for chef de projet gives no
results while searching for chef projet gives the right result.

Here is an excerpt from the test I made:

DISMAX query (q)=(chef de projet)

=The Parameters=

*queryResponse*=[{responseHeader={status=0,QTime=157,

params={facet=true,

f.createDate.facet.date.start=NOW/DAY-6DAYS,tie=0.1,

facet.limit=4,

f.location.facet.limit=3,

*q.alt*=*:*,

facet.date.other=all,

hl=true,version=2,

*bq*=[categoryPayloads:category1071^1,
categoryPayloads:category10055078^1, categoryPayloads:category10055405^1],

fl=*,score,

debugQuery=true,

facet.field=[soldProvisions, contractTypeText, nafCodeText, createDate,
wage, keywords, labelLocation, jobCode, organizationName,
requiredExperienceLevelText],

*qs*=3,

qt=edismax,

facet.date.end=NOW/DAY,

*mm*=0,

facet.mincount=1,

facet.date=createDate,

*qf*= title^4.0 formattedDescription^2.0 nafCodeText^2.0 jobCodeText^3.0
organizationName^1.0 keywords^3.0 location^1.0 labelLocation^1.0
categoryPayloads^1.0,

hl.fl=title,

wt=javabin,

rows=20,

start=0,

*q*=(chef de projet),

facet.date.gap=+1DAY,

*stopwords*=false,

*ps*=3}},

The Solr Response
response={numFound=0

Debug Info

debug={

*rawquerystring*=(chef de projet),

*querystring*=(chef de projet),

*---
*

*parsedquery*=

+*DisjunctionMaxQuery*((title:chef de projet~3^4.0 | keywords:chef de
projet^3.0 | organizationName:chef de projet | location:chef de projet |
formattedDescription:chef de projet~3^2.0 | nafCodeText:chef de
projet^2.0 | jobCodeText:chef de projet^3.0 | categoryPayloads:chef de
projet~3 | labelLocation:chef de projet)~0.1)
*DisjunctionMaxQuery*((title:((chef
chef) de (projet) projet)~3^4.0)~0.1) categoryPayloads:category1071
categoryPayloads:category10055078 categoryPayloads:category10055405,

*---*

*parsedquery_toString*=+(title:chef de projet~3^4.0 | keywords:chef de
projet^3.0 | organizationName:chef de projet | location:chef de projet |
formattedDescription:chef de projet~3^2.0 | nafCodeText:chef de
projet^2.0 | jobCodeText:chef de projet^3.0 | categoryPayloads:chef de
projet~3 | labelLocation:chef de projet)~0.1 (title:((chef chef) de
(projet) projet)~3^4.0)~0.1 categoryPayloads:category1071
categoryPayloads:category10055078 categoryPayloads:category10055405,



explain={},

QParser=ExtendedDismaxQParser,altquerystring=null,

*boost_queries*=[categoryPayloads:category1071^1,
categoryPayloads:category10055078^1, categoryPayloads:category10055405^1],

*parsed_boost_queries*=[categoryPayloads:category1071,
categoryPayloads:category10055078, categoryPayloads:category10055405],
boostfuncs=null,

2) I tried to remove the bq values but no changes:

*querystring*=(chef de projet),

*parsedquery*=+*DisjunctionMaxQuery*((title:chef de projet~3^4.0 |
keywords:chef de projet^3.0 | organizationName:chef de projet |
location:chef de projet | formattedDescription:chef de projet~3^2.0 |
nafCodeText:chef de projet^2.0 | jobCodeText:chef de projet^3.0 |
categoryPayloads:chef de projet~3 | labelLocation:chef de projet)~0.1) *
DisjunctionMaxQuery*((title:((chef chef) de (projet)
projet)~3^4.0)~0.1),
*parsedquery_toString*=+(title:chef de projet~3^4.0 | keywords:chef de
projet^3.0 | organizationName:chef de projet | location:chef de projet |
formattedDescription:chef de projet~3^2.0 | nafCodeText:chef de
projet^2.0 | jobCodeText:chef de projet^3.0 | categoryPayloads:chef de
projet~3 | labelLocation:chef de projet)~0.1 (title:((chef chef) de
(projet) projet)~3^4.0)~0.1,

3) and the query which works

debug={

*rawquerystring*=(chef  projet),

*querystring*=(chef  projet),

*parsedquery*=+*DisjunctionMaxQuery*((title:chef projet~3^4.0 |
keywords:chef  projet^3.0 | organizationName:chef  projet |
location:chef  projet
| formattedDescription:chef projet~3^2.0 | nafCodeText:chef  projet^2.0 |
jobCodeText:chef  projet^3.0 | categoryPayloads:chef projet~3 |
labelLocation:chef  projet)~0.1) *DisjunctionMaxQuery*((title:((chef
chef) (projet) projet)~3^4.0)~0.1),

*parsedquery_toString*=+(title:chef projet~3^4.0 | keywords:chef  projet^3.0
| organizationName:chef  projet | location:chef  projet |
formattedDescription:chef projet~3^2.0 | nafCodeText:chef  projet^2.0 |
jobCodeText:chef  projet^3.0 | categoryPayloads:chef projet~3 |
labelLocation:chef  projet)~0.1 (title:((chef chef) (projet)
projet)~3^4.0)~0.1,

explain={23715081=

14.832518 = (MATCH) sum of:

I really don't know how to solve this issue and would appreciate any help

Best wishes

Jean-Claude


On Tue, Nov 15, 2011 at 9:28 PM, Erick Erickson erickerick...@gmail.comwrote:

 The query re-writing is...er...interesting, and I'll skip that for now...

 As for why you're not getting results, 

Re: Aggregated indexing of updating RSS feeds

2011-11-16 Thread sbarriba
All,
Can anyone advise how to stop the deleteAll event during a full import? 

As discussed above using clean=false with Solr 3.4 still seems to trigger a
delete of all previous imported data. I want to aggregate the results of
multiple imports.

Thanks in advance.
S

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Aggregated-indexing-of-updating-RSS-feeds-tp3485335p3512260.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can we have lucene regular and fastVectorHiglighter together in solr

2011-11-16 Thread Koji Sekiguchi

(11/11/16 18:58), Shyam Bhaskaran wrote:

Hi,

Can we use Lucene regular highlighter along with fastVectorHighlighter together 
in solrconfig.xml (solr) ?

-Shyam



Yes, you can. See highlighting/ section in 
solr/example/solr/conf/solrconfig.xml for example.

koji
--
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/


Rich document indexing

2011-11-16 Thread kumar8anuj
I am using solr 3.4 and configured my DataImportHandler to get some data from
MySql as well as index some rich document from the disk. 

This is the part of db-data-config file where i am indexing Rich text
documents.


entity name=resume dataSource=ds-db query=Select
name,js_login_id div 25000 as dir from js_resumes where
js_login_id='${js_logins.id}' and is_primary = 1 and deleted=0 and mask_cv
!= 1 pk=resume_name
deltaQuery=select js_login_id from js_resumes where
modified  '${dataimporter.last_index_time}' and is_primary = 1 and
deleted=0
parentDeltaQuery=select  jsl.id as id  from
service_request_histories srh,service_requests sr, js_login_screenings jsls,
js_logins jsl where jsl.status IN(1,2) and srh.service_request_id = sr.id 
and jsl.id=jsls.js_login_id and srh.status in ('8','43') and jsls.id=srh.sid
and date(srh.created) lt; date_sub(now(),interval 2 day) and jsl.id =
'${js_resumes.js_login_id}'

entity processor=TikaEntityProcessor
tikaConfig=tika-config.xml
url=http://localhost/resumes-new/resumes${resume.dir}/${js_logins.id}/${resume.name};
dataSource=ds-file format=text
field column=text name=resume /
/entity
/entity


But after some time i get the following error in my error log. It looks like
a class missing error, Can anyone tell me which poi jar version would work
with tika.0.6. Currently I have  poi-3.7.jar. 

Error which i am getting is this  

SEVERE: Exception while processing: js_logins document :
SolrInputDocument[{id=id(1.0)={100984},
complete_mobile_number=complete_mobile_number(1.0)={+91 9600067575},
emailid=emailid(1.0)={vkry...@gmail.com}, full_name=full_name(1.0)={Venkat
Ryali}}]:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodError:
org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V
 
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669)
 
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
 
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
 
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) 
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) 
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
 
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) 
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) 
Caused by: java.lang.NoSuchMethodError:
org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V
 
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:163)
 
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:161)
 
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractTableContent(XWPFWordExtractorDecorator.java:140)
 
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:91)
 
at
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:69)
 
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:51) 
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) 
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) 
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
 
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
 
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
 
... 7 more 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Rich-document-indexing-tp3512276p3512276.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OutOfMemoryError when using query with sort

2011-11-16 Thread Benson Ba
Hi Hamid,

i also encounterd the same OOM issue on windows 2003 (32-bits) server... but
only 3 millions articles stored in solr. i would like to know your
configurations to drive so many records. 
Many thanks. 


Best Regards
Benson


--
View this message in context: 
http://lucene.472066.n3.nabble.com/OutOfMemoryError-when-using-query-with-sort-tp729437p3512224.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Different maxAnalyzedChars value in solrconfig.xml

2011-11-16 Thread Koji Sekiguchi

(11/11/16 13:12), Shyam Bhaskaran wrote:

Hi,

Wanted to know whether we can set different maxAnalyzedChars values in the 
solrconfig.xml based on different fields.

Can someone point if this is possible at all, my requirement needs me to set 
different values for maxAnalyzedChars parameter based on two different field 
values.

For example field type has the value xxx then the maxAnalyzedChars needs to be set to 
1MB and if the value is yyy the maxAnalyzedChars needs to set to 3MB. Let me know if 
this can be done and how to do set this.


I don't think it is possible.

koji
--
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/


Re: Problems installing Solr PHP extension

2011-11-16 Thread Adolfo Castro Menna
Pecl installation is kinda buggy. I installed it ignoring pecl dependencies
because I already had them.

Try: pecl install -n solr  (-n ignores dependencies)
And when it prompts for curl and libxml, point the path to where you have
installed them, probably in /usr/lib/

Cheers,
Adolfo.

On Tue, Nov 15, 2011 at 7:27 PM, Travis Low t...@4centurion.com wrote:

 I know this isn't strictly Solr, but I've been at this for hours and I'm at
 my wits end.  I cannot install the Solr PECL extension (
 http://pecl.php.net/package/solr), either by command line pecl install
 solr or by downloading and using phpize.  Always the same error, which I
 see here:

 http://www.lmpx.com/nav/article.php/news.php.net/php.qa.reports/24197/read/index.html

 It boils down to this:
 PHP Warning: PHP Startup: Unable to load dynamic library
 '/root/solr-0.9.11/modules/solr.so' - /root/solr-0.9.11/modules/solr.so:
 undefined symbol: curl_easy_getinfo in Unknown on line 0

 I am using the current Solr PECL extension.  PHP 5.3.8.  Curl 7.21.3.  Yes,
 libcurl and libcurl-dev are both installed, also 7.21.3.  Fedora Core 15,
 patched to current levels.

 Please help!

 cheers,

 Travis
 --

 **

 *Travis Low, Director of Development*


 ** t...@4centurion.com* *

 *Centurion Research Solutions, LLC*

 *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

 *703-956-6276 *•* 703-378-4474 (fax)*

 *http://www.centurionresearch.com* http://www.centurionresearch.com

 **The information contained in this email message is confidential and
 protected from disclosure.  If you are not the intended recipient, any use
 or dissemination of this communication, including attachments, is strictly
 prohibited.  If you received this email message in error, please delete it
 and immediately notify the sender.

 This email message and any attachments have been scanned and are believed
 to be free of malicious software and defects that might affect any computer
 system in which they are received and opened. No responsibility is accepted
 by Centurion Research Solutions, LLC for any loss or damage arising from
 the content of this email.



Join and faceting by children's attributes

2011-11-16 Thread Tobias
Hello,

I currently have a demand for faceting on the children of a join query.

My index is set up in a way that there are parent and child documents.
The child documents do have the facet information in a (precisely: some)
multivalue field(s). The parent documents themselves do not have any of it.

As the join query support allows me to do a simple search within the child
documents and return documents from the parent document space I thought
there
probably is a way to figure out the available facet values from the child
document space and present both in the result set, but this seems more
difficult
than I thought it would be.

The join query support would allow me to filter on specific
child-document-space
facet fields, for example:

but I can not really find a way to present *which faceting options are
available*
in the result set in first place.

Denormalizing my index in a way that the parent documents would contain
the faceting information is not an option at the moment, because I wanted
to keep the index more generic, so that there's not one field per attribute
but two generic attribute fields (multi-value), that keep the Key/Value
pairs,
like the following table shows. I need this setup because at index setup
time
I do not know which attributes for the various products/items will be
available.



If I now would denormalize a bunch of shoe child items into the parent
product
it would always contain all possible size/color combinations, even if some
of
the child products do not meet the initial search term's criteria, e.g.
searching above for (title:Sneakers AND desc:cool) should return just facets
for size (2), color (2), red (1), blue (1), 40 (1)  and 42 (1),
which I do postprocess in my client application, so that I know that
red and blue are colors and 40 and 42 are sizes.

I thought that you cracks might have an idea on how to continue from there.

Best,
Tobias


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-and-faceting-by-children-s-attributes-tp3512629p3512629.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Score Normalization

2011-11-16 Thread Jan Høydahl
Perhaps you can solve your usecase by playing with the new eDismax boost 
parameter, which multiplies the functions with the other score instead of 
adding.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 5. nov. 2011, at 01:26, sangrish wrote:

 
 Hi,
 
 
I have a (dismax) request handler which has the following 3 scoring
 components (1 qf  2 bf) :
 
qf = field1^2 field2^3
bf = func1(field3)^2 func2(field4)^3
 
  Both func1  func2 return scores between 0  1. The score returned by
 textual match (qf) ranges from 0 to NOT_A_FIXED_NUMBER
 
   To allow better combination of text match  my functions, I want the text
 score to be normalized between 0  1. Is there any way I can achieve that
 here?
 
 Thanks
 Sid
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Score-Normalization-tp3481627p3481627.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Search in multivalued string field does not work

2011-11-16 Thread mechravi25
Hi,

Thanks for the suggestions.

The index is the same in both the servers. We index using JDBC drivers.

We have not modified the request handler in solrconfig on either machine and
also after the latest schema update, we have re-indexed the data.


*We even checked the analysis page and there is no difference between both
the servers and after checking the highlight matches option in the field
value, the result was getting highlighted in the term text of Index
Analyzer. But still we confused as to why we are not getting the result in
the search page.*

Actually i forgot to post the dynamic field declaration in my schema file
and this is how it is declared.

dynamicField name=idx_* type=textgen  indexed=true  stored=true
multiValued=true / 
dynamicField name=*Facet type=string  indexed=true 
multiValued=true stored=false/ 

the textgen fieldtype definition is as follows:

fieldType name=textgen class=solr.TextField positionIncrementGap=100
 analyzer type=index  
   tokenizer class=solr.WhitespaceTokenizerFactory/

   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateintegerParts=1 catenateWords=1 catenateintegers=1
catenateAll=1 splitOnCaseChange=1 splitOnNumerics=1
stemEnglishPossessive=1 /
   filter class=solr.LowerCaseFilterFactory/
filter class=solr.PhoneticFilterFactory encoder=Soundex
inject=true/

 /analyzer
 analyzer type=query

   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory
   ignoreCase=true
   words=stopwords.txt
   enablePositionIncrements=true
   /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateintegerParts=1 catenateWords=0 catenateintegers=0
catenateAll=0 splitOnCaseChange=0/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
/fieldType


We have implemented shards in core DB which is in turn gets a result from
shards core(core1 and core2). This actual data is present in core2. We tried
all the options in core2 directly as well but with no success.

The query is passed as follows :

QueryString : idx_ABCFacet:XXX... ABC DEF

INFO: [core2] webapp=/solr path=/select
params={debugQuery=falsefl=uid,scorestart=0q=idx_ABCFacet:XXX...+ABC+DEFisShard=truewt=javabinfsv=truerows=10version=1}
hits=0 status=0 QTime=2 
Nov 16, 2011 5:44:17 AM org.apache.solr.core.SolrCore execute
INFO: [core1] webapp=/solr path=/select
params={debugQuery=falsefl=uid,scorestart=0q=idx_ABCFacet:XXX...+ABC+DEFisShard=truewt=javabinfsv=truerows=10version=1}
hits=0 status=0 QTime=0 
Nov 16, 2011 5:44:17 AM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/select/
params={debugQuery=onindent=onstart=0q=idx_ABCFacet:XXX...+ABC+DEFversion=2.2rows=10}
status=0 QTime=64 



Also can you please elaborate on the 3rd point

*3 Try using Luke to examine the indexes on both servers to determine 
 whether they're the same. *




Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-in-multivalued-string-field-does-not-work-tp3509458p3512710.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problems with AutoSuggest feature(Terms Components)

2011-11-16 Thread mechravi25
Hi,

When i search for a data i noticed two things

1.) I noticed that *terms.regex=.** in the logs which does a blank search
on terms because of the query time is more. Is there anyway to overcome
this. My actual query should go like the first one bolded but instead of
that it happens like in the second case(the 2nd text highlighted in bold)

2.) Also I noticed that *terms.limit=-1* which is very expensive as it asks
solr to return all the terms back. It should be set to 10 or 20 at most.
Please provide some suggestions to set the same.



Nov 14, 2011 2:04:08 PM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/terms
params={*terms.regex=ABC\+CCC\+lll*\+data.*terms.regex.flag=case_insensitiveterms.fl=nameFacet}
status=0 QTime=935 
Nov 14, 2011 2:04:08 PM org.apache.solr.core.SolrCore execute
INFO: [core2] webapp=/solr path=/terms
params={terms.regex.flag=case_insensitiveshards.qt=/termsterms.fl=nameFacetterms=trueterms.limit=-1terms.regex=ABC\+CCC\+lll\+data.*isShard=trueqt=/termswt=javabinterms.sort=indexversion=1}
status=0 QTime=842 
Nov 14, 2011 2:04:08 PM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/terms
params={terms.regex=ABC\+CCC\+lll\+data.*terms.regex.flag=case_insensitiveterms.fl=nameFacet}
status=0 QTime=927 
Nov 14, 2011 2:04:08 PM org.apache.solr.core.SolrCore execute
INFO: [core3] webapp=/solr path=/terms
params={terms.regex.flag=case_insensitiveshards.qt=/termsterms.fl=nameFacetterms=trueterms.limit=-1terms.regex=.*isShard=trueqt=/termswt=javabinterms.sort=indexversion=1}
status=0 QTime=115 

Nov 14, 2011 2:05:55 PM org.apache.solr.core.SolrCore execute
INFO: [core1] webapp=/solr path=/terms
params={terms.regex.flag=case_insensitiveshards.qt=/termsterms.fl=nameFacetterms=trueterms.limit=-1*terms.regex=.**isShard=trueqt=/termswt=javabinterms.sort=indexversion=1}
status=0 QTime=106767 
Nov 14, 2011 2:05:55 PM org.apache.solr.core.SolrCore execute
INFO: [core4] webapp=/solr path=/terms
params={terms.regex.flag=case_insensitiveshards.qt=/termsterms.fl=nameFacetterms=trueterms.limit=-1terms.regex=.*isShard=trueqt=/termswt=javabinterms.sort=indexversion=1}
status=0 QTime=106766 
Nov 14, 2011 2:05:55 PM org.apache.solr.core.SolrCore execute

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problems-with-AutoSuggest-feature-Terms-Components-tp3512734p3512734.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help! - ContentStreamUpdateRequest

2011-11-16 Thread Tod

Erick,

Autocommit is commented out in solrconfig.xml.  I have avoided them 
until after the indexing process is complete.  As an experiment I tried 
committing every n records processed to see if varying n would make a 
difference, it really didn't change much.


My original use case had the client running from the Solr server and 
streaming the document content over from a web server based on the URL 
gathered by a query from a backend database.  The locking problem 
appeared there first so I tried moving the client code to the web server 
to be closer the the documents origin.  That helped a little but ended 
up locking which is where I am now.


Solr should be able to index way more documents than the 35K I'm trying 
to index.  It seems from other's accounts they are able to do what I'm 
trying to do successfully.  Therefore I believe I must be doing 
something extraordinarily dumb.  I'll be happy to share any information 
about my environment or configuration if it will help find my error.


Thanks for all of your help.


- Tod





On 11/15/2011 8:08 PM, Erick Erickson wrote:

That's odd. What are your autocommit parameters? And are you either
committing or optimizing as part of your program? I'd bump the
autocommit parameters up and NOT commit (or optimize) from your
client if you are

Best
Erick

On Tue, Nov 15, 2011 at 2:17 PM, Todlistac...@gmail.com  wrote:

Otis,

The files are only part of the payload.  The supporting metadata exists in a
database.  I'm pulling that information, as well as the name and location of
the file, from the database and then sending it to a remote Solr instance to
be indexed.

I've heard Solr would prefer to get documents it needs to index in chunks
rather than one at a time as I'm doing now.  The one at a time approach is
locking up the Solr server at around 700 entries.  My thought was if I could
chunk them in a batch at a time the lockup will stop and indexing
performance would improve.


Thanks - Tod

On 11/15/2011 12:13 PM, Otis Gospodnetic wrote:


Hi,

How about just concatenating your files into one? �Would that work for
you?

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/




From: Todlistac...@gmail.com
To: solr-user@lucene.apache.org
Sent: Monday, November 14, 2011 4:24 PM
Subject: Help! - ContentStreamUpdateRequest

Could someone take a look at this page:

http://wiki.apache.org/solr/ContentStreamUpdateRequestExample

... and tell me what code changes I would need to make to be able to
stream a LOT of files at once rather than just one?� It has to be something
simple like a collection of some sort but I just can't get it figured out.�
Maybe I'm using the wrong class altogether?


TIA












Re: How to mix solr query info into the apache httpd logging (reverseproxy)?

2011-11-16 Thread alex_mass
Thanks for the answer mixing it up with params will certainly be the easiest
solution.

Alex

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-mix-solr-query-info-into-the-apache-httpd-logging-reverseproxy-tp3498539p3513097.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problems installing Solr PHP extension

2011-11-16 Thread Travis Low
Thanks so much for responding.  I tried your suggestion and the pecl build
*seems* to go okay, but after restarting Apache, I get this again in the
error_log:

 PHP Warning: PHP Startup: Unable to load dynamic library
 '/usr/lib64/php/modules/solr.so' - /usr/lib64/php/modules/solr.so:
 undefined symbol: curl_easy_getinfo in Unknown on line 0

I'm baffled by this because the undefined symbol is in libcurl.so, and I've
specified the path to that library.

If I can't solve this problem then we'll basically have to write our own
PHP Solr client, which would royally suck.

cheers,

Travis

On Wed, Nov 16, 2011 at 7:11 AM, Adolfo Castro Menna 
adolfo.castrome...@gmail.com wrote:

 Pecl installation is kinda buggy. I installed it ignoring pecl dependencies
 because I already had them.

 Try: pecl install -n solr  (-n ignores dependencies)
 And when it prompts for curl and libxml, point the path to where you have
 installed them, probably in /usr/lib/

 Cheers,
 Adolfo.

 On Tue, Nov 15, 2011 at 7:27 PM, Travis Low t...@4centurion.com wrote:

  I know this isn't strictly Solr, but I've been at this for hours and I'm
 at
  my wits end.  I cannot install the Solr PECL extension (
  http://pecl.php.net/package/solr), either by command line pecl install
  solr or by downloading and using phpize.  Always the same error, which I
  see here:
 
 
 http://www.lmpx.com/nav/article.php/news.php.net/php.qa.reports/24197/read/index.html
 
  It boils down to this:
  PHP Warning: PHP Startup: Unable to load dynamic library
  '/root/solr-0.9.11/modules/solr.so' - /root/solr-0.9.11/modules/solr.so:
  undefined symbol: curl_easy_getinfo in Unknown on line 0
 
  I am using the current Solr PECL extension.  PHP 5.3.8.  Curl 7.21.3.
  Yes,
  libcurl and libcurl-dev are both installed, also 7.21.3.  Fedora Core 15,
  patched to current levels.
 
  Please help!
 
  cheers,
 
  Travis
  --
 
  **
 
  *Travis Low, Director of Development*
 
 
  ** t...@4centurion.com* *
 
  *Centurion Research Solutions, LLC*
 
  *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*
 
  *703-956-6276 *•* 703-378-4474 (fax)*
 
  *http://www.centurionresearch.com* http://www.centurionresearch.com
 
  **The information contained in this email message is confidential and
  protected from disclosure.  If you are not the intended recipient, any
 use
  or dissemination of this communication, including attachments, is
 strictly
  prohibited.  If you received this email message in error, please delete
 it
  and immediately notify the sender.
 
  This email message and any attachments have been scanned and are believed
  to be free of malicious software and defects that might affect any
 computer
  system in which they are received and opened. No responsibility is
 accepted
  by Centurion Research Solutions, LLC for any loss or damage arising from
  the content of this email.
 




-- 

**

*Travis Low, Director of Development*


** t...@4centurion.com* *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* http://www.centurionresearch.com

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Re: Problems installing Solr PHP extension

2011-11-16 Thread Michael Kuhlmann

Am 16.11.2011 17:11, schrieb Travis Low:


If I can't solve this problem then we'll basically have to write our own
PHP Solr client, which would royally suck.


Oh, if you really can't get the library work, no problem - there are 
several PHP clients out there that don't need a PECL installation.


Personally, I have used http://code.google.com/p/solr-php-client/, it 
works well.


-Kuli


Re: Problems installing Solr PHP extension

2011-11-16 Thread Travis Low
Ah, ausgezeichnet, thank you Kuli!  We'll just use that.

On Wed, Nov 16, 2011 at 11:35 AM, Michael Kuhlmann k...@solarier.de wrote:

 Am 16.11.2011 17:11, schrieb Travis Low:


 If I can't solve this problem then we'll basically have to write our own
 PHP Solr client, which would royally suck.


 Oh, if you really can't get the library work, no problem - there are
 several PHP clients out there that don't need a PECL installation.

 Personally, I have used 
 http://code.google.com/p/solr-**php-client/http://code.google.com/p/solr-php-client/,
 it works well.

 -Kuli




-- 

**

*Travis Low, Director of Development*


** t...@4centurion.com* *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* http://www.centurionresearch.com

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Re: Easy way to tell if there are pending documents

2011-11-16 Thread Justin Caratzas

You can enable the stats handler
(https://issues.apache.org/jira/browse/SOLR-1750), and get inspect the
json pragmatically.

-- Justin

Latter, Antoine antoine.lat...@legis.wisconsin.gov writes:

 Thank you, that does help - but I am more looking for a way to get at this 
 programmatically.

 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
 Sent: Tuesday, November 15, 2011 11:22 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Easy way to tell if there are pending documents

 Antoine,

 On Solr Admin Stats page search for docsPending.  I think this is what you 
 are looking for.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem 
 search :: http://search-lucene.com/



From: Latter, Antoine antoine.lat...@legis.wisconsin.gov
To: 'solr-user@lucene.apache.org' solr-user@lucene.apache.org
Sent: Monday, November 14, 2011 11:39 AM
Subject: Easy way to tell if there are pending documents

Hi Solr,

Does anyone know of an easy way to tell if there are pending documents 
waiting for commit?

Our application performs operations that are never safe to perform
 while commits are pending. We make this work by making sure that all
 indexing operations end in a commit, and stop the unsafe operations
 from running while a commit is running.

This works great most of the time, except when we have enough disk
 space to add documents to the pending area, but not enough disk
 space to do a commit - then the indexing operations only error out
 after they've done all of their adds.

It would be nice if the unsafe operation could somehow detect that there are 
pending documents and abort.

In the interim I'll have the unsafe operation perform a commit when it 
starts, but I've been weeding out useless commits from my app recently and I 
don't like them creeping back in.

Thanks,
Antoine






RE: Easy way to tell if there are pending documents

2011-11-16 Thread Latter, Antoine
Excellent. It looks like I can drill down into exactly what I want without 
having to load up the rest of the statistics.

-Original Message-
From: Justin Caratzas [mailto:justin.carat...@gmail.com] 
Sent: Wednesday, November 16, 2011 10:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Easy way to tell if there are pending documents


You can enable the stats handler
(https://issues.apache.org/jira/browse/SOLR-1750), and get inspect the json 
pragmatically.

-- Justin

Latter, Antoine antoine.lat...@legis.wisconsin.gov writes:

 Thank you, that does help - but I am more looking for a way to get at this 
 programmatically.

 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: Tuesday, November 15, 2011 11:22 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Easy way to tell if there are pending documents

 Antoine,

 On Solr Admin Stats page search for docsPending.  I think this is what you 
 are looking for.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene 
 ecosystem search :: http://search-lucene.com/



From: Latter, Antoine antoine.lat...@legis.wisconsin.gov
To: 'solr-user@lucene.apache.org' solr-user@lucene.apache.org
Sent: Monday, November 14, 2011 11:39 AM
Subject: Easy way to tell if there are pending documents

Hi Solr,

Does anyone know of an easy way to tell if there are pending documents 
waiting for commit?

Our application performs operations that are never safe to perform  
while commits are pending. We make this work by making sure that all  
indexing operations end in a commit, and stop the unsafe operations  
from running while a commit is running.

This works great most of the time, except when we have enough disk  
space to add documents to the pending area, but not enough disk  space 
to do a commit - then the indexing operations only error out  after 
they've done all of their adds.

It would be nice if the unsafe operation could somehow detect that there are 
pending documents and abort.

In the interim I'll have the unsafe operation perform a commit when it 
starts, but I've been weeding out useless commits from my app recently and I 
don't like them creeping back in.

Thanks,
Antoine






strange behavior of scores and term proximity use

2011-11-16 Thread Ariel Zerbib
Hi,

For this term proximity query: ab_main_title_l0:to be or not to be~1000

http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000sort=score+descstart=0rows=3fl=ab_main_title_l0%2Cscore%2CiddebugQuery=true

The third first results are the following one:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
  int name=status0/int
  int name=QTime5/int
/lst
result name=response numFound=318 start=0 maxScore=3.0814114
  doc
long name=id2315190010001021/long
arr name=ab_main_title_l0
  strog54ct8n To be or not to be a Jew. 5w8ojsx2/str
/arr
float name=score3.0814114/float/doc
  doc
long name=id2313006480001021/long
arr name=ab_main_title_l0
  strog54ct8n To be or not to be 5w8ojsx2/str
/arr
float name=score3.0814114/float/doc
  doc
long name=id2356410250001021/long
arr name=ab_main_title_l0
  strog54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2/str
/arr
float name=score3.0814114/float/doc
/result
lst name=debug
  str name=rawquerystringab_main_title_l0:og54ct8n to be or not to be
5w8ojsx2~1000/str
  str name=querystringab_main_title_l0:og54ct8n to be or not to be
5w8ojsx2~1000/str
  str name=parsedqueryPhraseQuery(ab_main_title_l0:og54ct8n to be or
not to be 5w8ojsx2~1000)/str
  str name=parsedquery_toStringab_main_title_l0:og54ct8n to be or not
to be 5w8ojsx2~1000/str
  lst name=explain
str name=2315190010001021
5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
5w8ojsx2~1000 in 378403) [DefaultSimilarity], result of:
  5.337161 = fieldWeight in 378403, product of:
0.57735026 = tf(freq=0.3334), with freq of:
  0.3334 = phraseFreq=0.3334
29.581549 = idf(), sum of:
  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
0.3125 = fieldNorm(doc=378403)
/str
str name=2313006480001021
9.244234 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
5w8ojsx2~1000 in 482807) [DefaultSimilarity], result of:
  9.244234 = fieldWeight in 482807, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = phraseFreq=1.0
29.581549 = idf(), sum of:
  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
0.3125 = fieldNorm(doc=482807)
/str
str name=2356410250001021
5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
5w8ojsx2~1000 in 1317563) [DefaultSimilarity], result of:
  5.337161 = fieldWeight in 1317563, product of:
0.57735026 = tf(freq=0.3334), with freq of:
  0.3334 = phraseFreq=0.3334
29.581549 = idf(), sum of:
  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
0.3125 = fieldNorm(doc=1317563)
/str
/response

The used version is a 4.0 October snapshot.

I have 2 questions about the result:
- Why debug print and scores in result are different?
- What is the expected behavior of this kind of term proximity query?
  - The debug scores seem to be well ordered but the result scores
seem to be wrong.


Thanks,
Ariel


Re: Phrase between quotes with dismax edismax

2011-11-16 Thread Erick Erickson
Ah, ok I was mis-reading some things. So, let's ignore the
category bits for now.

Questions:
1 Can you refine down the problem. That is,
demonstrate this with a single field and leave out
the category stuff. Something like
q=title:chef de projet getting no results and
q=title:chef projet getting results? The idea
is to cycle through all the fields to see if we can
hone in on the problem. I'd get rid of any pf
parameters of your edismax definition too. I'm after
   the simplest case that can demonstrate the issue.
   For that matter, it'd be even easier if you could
   make this happen with the default searcher (
   solr/select?q=title:chef de projet
2 if you can do 1, please post the field definitions
 from your schema.xml file. One possibility is that
 you are removing stopwords at index time but not
 query time or vice-versa, but that's a wild guess.
3 Once you have a field, use the admin/analysis page
 to see the exact transformations that occur at index
 and query time to see if anything jumps out.

All in all, I suspect you have a field that isn't being parsed
as you expect at either index or query time, but as I said
above, that's a guess.

Best
Erick

On Wed, Nov 16, 2011 at 5:02 AM, Jean-Claude Dauphin
jc.daup...@gmail.com wrote:
 Thanks Erick for yr quick answer.

 I am using Solr 3.1

 1) I have set the mm parameter to 0 and removed the categories from the
 search. Thus the query is only for chef de projet and nothing else.
 But the problem remains, i.e searching for chef de projet gives no
 results while searching for chef projet gives the right result.

 Here is an excerpt from the test I made:

 DISMAX query (q)=(chef de projet)

 =The Parameters=

 *queryResponse*=[{responseHeader={status=0,QTime=157,

 params={facet=true,

 f.createDate.facet.date.start=NOW/DAY-6DAYS,tie=0.1,

 facet.limit=4,

 f.location.facet.limit=3,

 *q.alt*=*:*,

 facet.date.other=all,

 hl=true,version=2,

 *bq*=[categoryPayloads:category1071^1,
 categoryPayloads:category10055078^1, categoryPayloads:category10055405^1],

 fl=*,score,

 debugQuery=true,

 facet.field=[soldProvisions, contractTypeText, nafCodeText, createDate,
 wage, keywords, labelLocation, jobCode, organizationName,
 requiredExperienceLevelText],

 *qs*=3,

 qt=edismax,

 facet.date.end=NOW/DAY,

 *mm*=0,

 facet.mincount=1,

 facet.date=createDate,

 *qf*= title^4.0 formattedDescription^2.0 nafCodeText^2.0 jobCodeText^3.0
 organizationName^1.0 keywords^3.0 location^1.0 labelLocation^1.0
 categoryPayloads^1.0,

 hl.fl=title,

 wt=javabin,

 rows=20,

 start=0,

 *q*=(chef de projet),

 facet.date.gap=+1DAY,

 *stopwords*=false,

 *ps*=3}},

 The Solr Response
 response={numFound=0

 Debug Info

 debug={

 *rawquerystring*=(chef de projet),

 *querystring*=(chef de projet),

 *---
 *

 *parsedquery*=

 +*DisjunctionMaxQuery*((title:chef de projet~3^4.0 | keywords:chef de
 projet^3.0 | organizationName:chef de projet | location:chef de projet |
 formattedDescription:chef de projet~3^2.0 | nafCodeText:chef de
 projet^2.0 | jobCodeText:chef de projet^3.0 | categoryPayloads:chef de
 projet~3 | labelLocation:chef de projet)~0.1)
 *DisjunctionMaxQuery*((title:((chef
 chef) de (projet) projet)~3^4.0)~0.1) categoryPayloads:category1071
 categoryPayloads:category10055078 categoryPayloads:category10055405,

 *---*

 *parsedquery_toString*=+(title:chef de projet~3^4.0 | keywords:chef de
 projet^3.0 | organizationName:chef de projet | location:chef de projet |
 formattedDescription:chef de projet~3^2.0 | nafCodeText:chef de
 projet^2.0 | jobCodeText:chef de projet^3.0 | categoryPayloads:chef de
 projet~3 | labelLocation:chef de projet)~0.1 (title:((chef chef) de
 (projet) projet)~3^4.0)~0.1 categoryPayloads:category1071
 categoryPayloads:category10055078 categoryPayloads:category10055405,



 explain={},

 QParser=ExtendedDismaxQParser,altquerystring=null,

 *boost_queries*=[categoryPayloads:category1071^1,
 categoryPayloads:category10055078^1, categoryPayloads:category10055405^1],

 *parsed_boost_queries*=[categoryPayloads:category1071,
 categoryPayloads:category10055078, categoryPayloads:category10055405],
 boostfuncs=null,

 2) I tried to remove the bq values but no changes:

 *querystring*=(chef de projet),

 *parsedquery*=+*DisjunctionMaxQuery*((title:chef de projet~3^4.0 |
 keywords:chef de projet^3.0 | organizationName:chef de projet |
 location:chef de projet | formattedDescription:chef de projet~3^2.0 |
 nafCodeText:chef de projet^2.0 | jobCodeText:chef de projet^3.0 |
 categoryPayloads:chef de projet~3 | labelLocation:chef de projet)~0.1) *
 DisjunctionMaxQuery*((title:((chef chef) de (projet)
 projet)~3^4.0)~0.1),
 *parsedquery_toString*=+(title:chef de projet~3^4.0 | keywords:chef de
 projet^3.0 | organizationName:chef de projet | location:chef de projet |
 formattedDescription:chef de 

Look what i found here...

2011-11-16 Thread Harsha Vardhan Muthyala
pHi friend!brI think I found the answer to everyones problems Look at this 
articlebra 
href=http://ulysse.co.za/profile/89LeeAlien/;http://ulysse.co.za/profile/89LeeAlien//abrsee
 you later./p


Re: Dismax and phrases

2011-11-16 Thread Chris Hostetter

: I am starting to wonder whether the module giving finnish language support
: (lingsoft) might be the cause?

It's extremeley possible -- the details relaly matter when debugging 
things like this.

Since i don't have any access to these custom plugins, i don't know what 
they might be doing, or how they might be affecting the terms produced 
during analysis to explain why you are getting the structure you are -- 
but one explanation might be if every term produced by them gets a 
positionIncrement of 0 ... that would tell the query parser to treat 
them as alternatives -- it's the same thing SynonymFilter does.

you'd have to look at the output from the analysis tool ,feeding your 
example input into the query analyzer to see what terms it produces (and 
what attributes those terms have).  if it is a position increment issue, 
then you should see the same OR style query structure (instead of a 
phrase query) even if you use the default lucene parser and give it a 
quoted phrase...

text_fi:asuntojen hinnat


-Hoss


Re: Search in multivalued string field does not work

2011-11-16 Thread Erick Erickson
Attach debugQuery=true to the URL and look at the results, that'll
show you what the query parsed as on the actual server.

Where did shards come from? I'd advise turning all the shard stuff
off until you answer this question and querying the server directly,
shards may be confusing the issue. Let's get to the bottom of your
query problems before introducing that complexity!

By Luke, I mean get a copy of the Luke program, see here:
http://code.google.com/p/luke/

Run that program and point it at the index for your severs. It'll allow
you to examine the contents of of the indexes at a fairly low level.
Look at the fields in question and see if the data you expect to match is,
indeed, there.

From what you've said, I'd guess it's some difference between
the two servers, because on the surface of it I don't see why you'd
be seeing the differences you claim. So either what you think is on
the servers isn't there, I don't understand the problem or

Best
Erick

On Wed, Nov 16, 2011 at 9:11 AM, mechravi25 mechrav...@yahoo.co.in wrote:
 Hi,

 Thanks for the suggestions.

 The index is the same in both the servers. We index using JDBC drivers.

 We have not modified the request handler in solrconfig on either machine and
 also after the latest schema update, we have re-indexed the data.


 *We even checked the analysis page and there is no difference between both
 the servers and after checking the highlight matches option in the field
 value, the result was getting highlighted in the term text of Index
 Analyzer. But still we confused as to why we are not getting the result in
 the search page.*

 Actually i forgot to post the dynamic field declaration in my schema file
 and this is how it is declared.

 dynamicField name=idx_* type=textgen  indexed=true  stored=true
 multiValued=true /
 dynamicField name=*Facet type=string  indexed=true
 multiValued=true stored=false/

 the textgen fieldtype definition is as follows:

 fieldType name=textgen class=solr.TextField positionIncrementGap=100
  analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/

   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateintegerParts=1 catenateWords=1 catenateintegers=1
 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=1
 stemEnglishPossessive=1 /
   filter class=solr.LowerCaseFilterFactory/
        filter class=solr.PhoneticFilterFactory encoder=Soundex
 inject=true/

  /analyzer
  analyzer type=query

   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory
           ignoreCase=true
           words=stopwords.txt
           enablePositionIncrements=true
           /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateintegerParts=1 catenateWords=0 catenateintegers=0
 catenateAll=0 splitOnCaseChange=0/
   filter class=solr.LowerCaseFilterFactory/
  /analyzer
 /fieldType


 We have implemented shards in core DB which is in turn gets a result from
 shards core(core1 and core2). This actual data is present in core2. We tried
 all the options in core2 directly as well but with no success.

 The query is passed as follows :

 QueryString : idx_ABCFacet:XXX... ABC DEF

 INFO: [core2] webapp=/solr path=/select
 params={debugQuery=falsefl=uid,scorestart=0q=idx_ABCFacet:XXX...+ABC+DEFisShard=truewt=javabinfsv=truerows=10version=1}
 hits=0 status=0 QTime=2
 Nov 16, 2011 5:44:17 AM org.apache.solr.core.SolrCore execute
 INFO: [core1] webapp=/solr path=/select
 params={debugQuery=falsefl=uid,scorestart=0q=idx_ABCFacet:XXX...+ABC+DEFisShard=truewt=javabinfsv=truerows=10version=1}
 hits=0 status=0 QTime=0
 Nov 16, 2011 5:44:17 AM org.apache.solr.core.SolrCore execute
 INFO: [db] webapp=/solr path=/select/
 params={debugQuery=onindent=onstart=0q=idx_ABCFacet:XXX...+ABC+DEFversion=2.2rows=10}
 status=0 QTime=64



 Also can you please elaborate on the 3rd point

 *3 Try using Luke to examine the indexes on both servers to determine
     whether they're the same. *




 Thanks.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Search-in-multivalued-string-field-does-not-work-tp3509458p3512710.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Similar documents and advantages / disadvantages of MLT / Deduplication

2011-11-16 Thread Chris Hostetter

: I index 1000 docs, 5 of them are 95% the same (for example: copy pasted
: blog articles from different sources, with slight changes (author name,
: etc..)).
: But they have differences.
: *Now i like to see 1 doc in my result set and the other 4 should be marked
: as similar.*

Do you actaully want al 1000 docs in your index, or do you want to prevent 
4 of the 5 copies of hte doc from being indexed?

Either way, if the the TextProfileSignature is doing a good job of 
identifying the 5 similar docs, then use that at index time.

If you want to keep 4/5 out of the index, then use the Deduplcation 
features to prefent the duplicates from being indexed and your done.  

If you wnat all docs in the index, then you have to decide how you want to 
mark docs as similar ... do you want to only have one of those docs 
appear in all of your results, or do you want all of them in the results 
but with an indication that there are other similar docs?  If the former: 
then take a look at Grouping and group on your signature field.  If the 
latter, use the MLT component, to find similar docs based on the signature 
field (ie: mlt.fl=signature_t)

https://wiki.apache.org/solr/FieldCollapsing

-Hoss


Re: size of data replicated

2011-11-16 Thread Chris Hostetter

: query response time. To get a clear picture, I would like to know how
: to get the size of data being replicated for each commit. Through the
: admin UI, you may read a x of y G data is being replicated; however,
: y is the total index size, instead of data being copied over. I
: couldn't find the info in the solr logs either. Any idea?

maybe i'm missunderstanding your question, but isn't x in your example 
the number that you are looking for? (ie: how much data was replicated?)

-Hoss


Re: maxFieldLength clarifications

2011-11-16 Thread Chris Hostetter

:1. is the maxFieldLength parameter deprecated?
:2. what is maxFieldLength counting? I understood it's counting tokens
:per document (not per field)
:3. what if I simply remove the maxFieldLength setting from the
:solrconfig?

1. it has been deprecated and will not be used in Solr 4x, but still 
exists in Solr 3x

2. It should be terms per field per document, not just per document.

3) if you don't specify it in solrconfig.xml it defaults to -1 which 
means no limit.

: From what I see if I remove it from the solrconfig the text values are
: still constrained to some bound since if I query the last term in a long
: document's text I don't get a match.

a) what version of solr are you using?
b) double check both the mainIndex and indexDefaults sections of your 
solrconfig.xml and make sure maxFieldLength isn't in either of them.

-Hoss


Re: Solr Score Normalization

2011-11-16 Thread Chris Hostetter

: Perhaps you can solve your usecase by playing with the new eDismax 
: boost parameter, which multiplies the functions with the other score 
: instead of adding.

and FWIW: the boost param of the edismax parser is really just syntactic 
sugar for using the BoostQParsre wrapped arround an edismax query -- you 
can wrap it around any query produced by any QParser...

  q={!edismax qf=foo}barboost=func(asdf)

...is the same as...

  q={!boost b=func(asdf) v=$qq}qq={!edismax qf=foo}bar



-Hoss


Re: to prevent number-of-matching-terms in contributing score

2011-11-16 Thread Chris Hostetter

:  1. omitTermFreqAndPositions is very straightforward but if I avoid
: positions I'll refuse to serve phrase queries. I had searched for this in

but do you really need phrase queries on your cat field?  i thought the 
point was to have simple matching on those terms?

:  2. Function query seemed nice (though strange because I never used it
: before) and I gave it a few hours but that too did not seem to solve my
: requirement. The artificial score we are generating is getting multiplied
: into rest of the score which includes score due to cat field as well. (I
: can not remove cat from qf as I have to search there). It is only that
: I don't want this field's score on the basis of matching tf.

I don't think i realized you were using dismax ... if you just want a 
match on cat to help determine if the document is a match, but not have 
*any* impact on score, you could just set the qf boost to 0 (ie: 
qf=title^10 cat^0) but i'm not sure if that's really what you want.

: After spending some hours on function queries I finally reached on
: following query

Honestly: i'm not really following what you tried there because of the 
formatting applied by your email client ... it seemed to be making tons of 
hyperlinks out of peices of the URL.

Looking at your query explanation however the problem seems to be that you 
are still using the relevancy score of the matches on the cat field, 
instead of *just* using hte function boost...

: But debugging the query showed that the boost value ($cat_boost) is being
: multiplied into a value which is generated with the help of cat field
: thus resulting in different scores for 1 and 3 (similarly for 2 and 4).
: 
: 1.2942866 = (MATCH) boost(+(title:chair | cat:chair)~0.01
: (),map(query(cat:chair,def=-1.0),0.0,1000.0,1.0)), product of:

...my point before was to take cat:chair out of the main part of your 
query, and *only* put it in the boost function.  if you are using dismax, 
the qf=cat^0 suggestion mentioned above *combined* with your boost 
function will probably get you what you want (i think)

: I was thinking there should be some hook or plugin (or anything) which
: could just change the score calculation formula *for a particular field*.
: There is a function in DefaultSimilarity class - *public float tf(float
: freq)* but that does not mention the field name. Is there a possibility to
: look into this direction?

on trunk, there is a distinct Similarity object per fieldtype, so you 
could certain look at that -- but you are correct that in 3x there is no 
way to override the tf() function on a per field basis.


-Hoss


Re: to prevent number-of-matching-terms in contributing score

2011-11-16 Thread Chris Hostetter

:  1. omitTermFreqAndPositions is very straightforward but if I avoid
: positions I'll refuse to serve phrase queries. I had searched for this in

but do you really need phrase queries on your cat field?  i thought the 
point was to have simple matching on those terms?

:  2. Function query seemed nice (though strange because I never used it
: before) and I gave it a few hours but that too did not seem to solve my
: requirement. The artificial score we are generating is getting multiplied
: into rest of the score which includes score due to cat field as well. (I
: can not remove cat from qf as I have to search there). It is only that
: I don't want this field's score on the basis of matching tf.

I don't think i realized you were using dismax ... if you just want a 
match on cat to help determine if the document is a match, but not have 
*any* impact on score, you could just set the qf boost to 0 (ie: 
qf=title^10 cat^0) but i'm not sure if that's really what you want.

: After spending some hours on function queries I finally reached on
: following query

Honestly: i'm not really following what you tried there because of the 
formatting applied by your email client ... it seemed to be making tons of 
hyperlinks out of peices of the URL.

Looking at your query explanation however the problem seems to be that you 
are still using the relevancy score of the matches on the cat field, 
instead of *just* using hte function boost...

: But debugging the query showed that the boost value ($cat_boost) is being
: multiplied into a value which is generated with the help of cat field
: thus resulting in different scores for 1 and 3 (similarly for 2 and 4).
: 
: 1.2942866 = (MATCH) boost(+(title:chair | cat:chair)~0.01
: (),map(query(cat:chair,def=-1.0),0.0,1000.0,1.0)), product of:

...my point before was to take cat:chair out of the main part of your 
query, and *only* put it in the boost function.  if you are using dismax, 
the qf=cat^0 suggestion mentioned above *combined* with your boost 
function will probably get you what you want (i think)

: I was thinking there should be some hook or plugin (or anything) which
: could just change the score calculation formula *for a particular field*.
: There is a function in DefaultSimilarity class - *public float tf(float
: freq)* but that does not mention the field name. Is there a possibility to
: look into this direction?

on trunk, there is a distinct Similarity object per fieldtype, so you 
could certain look at that -- but you are correct that in 3x there is no 
way to override the tf() function on a per field basis.


-Hoss


Re: Aggregated indexing of updating RSS feeds

2011-11-16 Thread Chris Hostetter

: ..but the request I'm making is..
: /solr/myfeed?command=full-importrows=5000clean=false
: 
: ..note the clean=false.

I see it, but i also see this in the logs you provided...

: INFO: [] webapp=/solr path=/myfeed params={command=full-import} status=0
: QTime=8

...which means someone somewhere is executing full-import w/o using 
clean=false.  

are you absolutely certain that you are executing the request you think 
you are?  can you find a request in your logs that includes clean=false?

if it's not you and your code -- it is comming from somewhere, and that's 
what's causing DIH to trigger a deleteAll...

: 10-Nov-2011 05:40:01 org.apache.solr.handler.dataimport.DataImporter
: doFullImport
: INFO: Starting Full Import
: 10-Nov-2011 05:40:01 org.apache.solr.handler.dataimport.SolrWriter
: readIndexerProperties
: INFO: Read myfeed.properties
: 10-Nov-2011 05:40:01 org.apache.solr.update.DirectUpdateHandler2 deleteAll
: INFO: [] REMOVING ALL DOCUMENTS FROM INDEX



-Hoss


Re: Add copyTo Field without re-indexing?

2011-11-16 Thread Kashif Khan
Please advise how we can reindex SOLR with having fields stored=false. we
can not reindex data from the beginning just want to read and write indexes
from the SOLRJ only. Please advise a solution. I know we can do it using
lucene classes using indexreader and indexwriter but want to index all
fields

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Add-copyTo-Field-without-re-indexing-tp3342253p3515020.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Add copyTo Field without re-indexing?

2011-11-16 Thread Michael Kuhlmann

Am 17.11.2011 08:46, schrieb Kashif Khan:

Please advise how we can reindex SOLR with having fields stored=false. we
can not reindex data from the beginning just want to read and write indexes
from the SOLRJ only. Please advise a solution. I know we can do it using
lucene classes using indexreader and indexwriter but want to index all
fields


This is not possible. At least not when the index is modified in any way 
(stemmed, lowercased, tokenized, etc.).


The original data is not saved when stored is false. You'll need your 
original source data to reindex then.


-Kuli


Explicitly tell Solr the analyzed value when indexing a document

2011-11-16 Thread Tim Terlegård
Hi,

I have a couple of string fields. For some of them I want from my
application to be able to index a lowercased string but store the
original value. Is there some way to do this? Or would I have to come
up with a new field type and implement an analyzer?

/Tim