no match or wrong match results

2011-07-25 Thread deniz
Here is the situation..

when i make search with 3 or more words, the results are corret, however if
i make a search by using only one word or two, there is no result, altough
there must be...

e.g
query = stephan ruhl germany munich
results are correct, documents with the words above retrieved 

however

query = stephan ruhl
results are not correct or even there is no result while some of the
matchinf documents must be shown.


any ideas about the issue?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/no-match-or-wrong-match-results-tp3199554p3199554.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to make a valid date facet query?

2011-07-25 Thread Floyd Wu
Hi all,

I need to make date faceted query and I tried to use facet.range but can't
get result I need.

I want to make 4 facet like following.

1 Months,3 Months, 6Months, more than 1 Year

The onlinedate field in schema.xml like this



I hit the solr by this url

http://localhost:8983/solr/select/?q=*%3A*
&start=0
&rows=10
&indent=on
&facet=true
&facet.range=onlinedate
&f.onlinedate.facet.range.start=NOW-1YEARS
&f.onlinedate.facet.range.end=NOW%2B1YEARS
&f.onlinedate.facet.range.gap=NOW-1MONTHS, NOW-3MONTHS,
NOW-6MONTHS,NOW-1YEAR

But the solr complained Exception during facet.range of onlinedate
org.apache.solr.common.SolrException: Can't add gap NOW-1MONTHS,
NOW-3MONTHS, NOW-6MONTHS,NOW-1YEAR to value Mon Jul 26 11:56:40 CST 2010 for


What is correct way to make this requirement to realized? Please help on
this.
Floyd


Re: Document IDs instead of count for facets?

2011-07-25 Thread Jeff Schmidt
Hi Yonik:

On Jul 17, 2011, at 9:30 AM, Yonik Seeley wrote:

> On Sun, Jul 17, 2011 at 10:38 AM, Jeff Schmidt  wrote:
>> I don't want to query for a particular facet value, but rather have Solr do 
>> a grouping of facet values. I'm not sure about the appropriate nomenclature 
>> there. But, I have a multi-valued field named "process" that can have values 
>> such as "catalysis", "activation", "inhibition", "expression", 
>> "modification", "reaction" etc.  About ~100K documents are indexed where 
>> this field may have none or one or more of these processes.
>> 
>> When the client makes a request, I need to tell it that for the process 
>> "catalysis", refer to documents 1,5,6,8,32 etc., and for "modification", 
>> documents 43545,22,2134, etc.
> 
> This sounds like grouping:
> http://wiki.apache.org/solr/FieldCollapsing
> 
> Unfortunately it only works on single values fields, and you can't
> sort based on numbers of matches either.

Oh man, so close!  The looks very usable for dealing with my problem, well 
except for the multi-valued fields thing... :(

> The closest you can get today is to issue 2 requests... the first a
> faceting request to get the top constraints, and then a second that
> uses group.query for each constraint you are interested in.

Hmm, this gets onerous rather quickly. I need to get the document IDs for all 
(non-zero count) facet values, not just the top ones. I can see where you're 
going with this.  For example, I issue the faceting query to learn all relevant 
values for the disease facet:



0
15

true
id
1
*:*
n_cellreg_diseaseExact
partner-xyz
n_pathway_id:ING\:ci0
0







29
26
21
21
18
15
15
...
2
2
2







Note the  There are actually 100 diseases returned for this one (of five) 
facet. The filter query on n_pathway_id defines a set of documents that 
represent nodes on a biological pathway.  Using just the top three values for 
that particular facet, the grouping query gives me what I want:




0
1

id
3
*:*

n_cellreg_diseaseExact:hypertrophy
n_cellreg_diseaseExact:neoplasia
n_cellreg_diseaseExact:cancer

true
partner-xyz
n_pathway_id:ING\:ci0




59


ING:5z7


ING:61b


ING:6ii




59


ING:61b


ING:6ii


ING:592




59


ING:5fz


ING:61b


ING:6ii






So, for each value of the facet, there are the document IDs.  But, if I want 
this for all 100 diseases, I need to add 100 group.query parameters.  Is that a 
problem, other than URL length? But, I have other facets that can also have a 
large number of values with non-zero counts. Also, it seems SolrJ 3.3.0 does 
not support grok'ing the group query response. 

Just for grins I did try using group.field, and like you said, Solr does not 
like that on multi-valued fields. :) I guess I'll have to keep thinking on this 
one.  If per chance I get inspired to look at the Solr source code for how the 
facet counts are calculated to see if the document IDs can be made available, 
can you help localize where I should be looking?  Or, better yet, do you have 
any idea when group.field will support multi-valued fields?

Thanks!

Jeff
--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068











Re: Logically equivalent queries but vastly different no of results?

2011-07-25 Thread cnyee
Yes - I am using edismax but the reason is not obvious to me can you give
me a pointer?

Thanks
Yee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Logically-equivalent-queries-but-vastly-different-no-of-results-tp3190278p3199362.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to query solr status

2011-07-25 Thread ZiLi
Anybody who knows how to query an solr server whether it is optimized or not ?
As replication can config slave to pull the indexes after "optimized" ,so I 
think there must be someway to query that .But I didn't find any document to 
identify that , anyone knows ?
Thanks so much O(n_n)O


Re: Updating fields in an existing document

2011-07-25 Thread Chris Hostetter
: As in http://wiki.apache.org/solr/UpdateXmlMessages?

Exactly ... the title is "XML Messages for Updating a Solr Index"

But i do see some confusing usages of "add/update" in the context of 
documents that definitely don't belong there -- so i've changed them to 
"add/replace". 

Thanks for bringing this up.

-Hoss


Re: commit time and lock

2011-07-25 Thread Jonathan Rochkind

Thanks, this is helpful.

I do indeed periodically update or delete just about every doc in the 
index, so it makes sense that optimization might be neccesary even in 
post 1.4, but I'm still on 1.4 -- add this to another thing to look into 
rather than assume after I upgrade.


Indeed I was aware that it would trigger a pretty complete index 
replication, but, since it seemed to greatly improve performance (in 
1.4), so it goes. But yes, I'm STILL only updating once a day, even with 
all that. (And in fact, I'm only replicating once a day too, ha).


On 7/25/2011 10:50 AM, Erick Erickson wrote:

Yeah, the 1.4 code base is "older". That is, optimization will have more
effect on that vintage code than on 3.x and trunk code.

I should have been a bit more explicit in that other thread. In the case
where you add a bunch of documents, optimization doesn't buy you all
that much currently. If you delete a bunch of docs (or update a bunch of
existing docs), then optimization will reclaim resources. So you *could*
have a case where the size of your index shrank drastically after
optimization (say you updated the same 100K documents 10 times then
optimized).

But even that is "it depends" (tm). The new segment merging, as I remember,
will possibly reclaim deleted resources, but I'm parroting people who actually
know, so you might want to verify that if it

Optimization will almost certainly trigger a complete index replication to any
slaves configured, though.

So the usual advice is to optimize maybe once a day or week during off hours
as a starting point unless and until you can verify that your
particular situation
warrants optimizing more frequently.

Best
Erick

On Fri, Jul 22, 2011 at 11:53 AM, Jonathan Rochkind  wrote:

How old is 'older'?  I'm pretty sure I'm still getting much faster performance 
on an optimized index in Solr 1.4.

This could be due to the nature of my index and queries (which include some 
medium sized stored fields, and extensive facetting -- facetting on up to a 
dozen fields in every request, where each field can include millions of unique 
values. Amazing I can do this with good performance at all!).

It's also possible i'm wrong about that faster performance, i haven't done 
robustly valid benchmarking on a clone of my production index yet. But it 
really looks like that way to me, from what investigation I have done.

If the answer is that optimization is believed no longer neccesary on versions 
LATER than 1.4, that might be the simplest explanation.

From: Pierre GOSSE [pierre.go...@arisem.com]
Sent: Friday, July 22, 2011 10:23 AM
To: solr-user@lucene.apache.org
Subject: RE: commit time and lock

Hi Mark

I've read that in a thread title " Weird optimize performance degradation", where Erick Erickson 
states that "Older versions of Lucene would search faster on an optimized index, but this is no longer 
necessary.", and more recently in a thread you initiated a month ago "Question about 
optimization".

I'll also be very interested if anyone had a more precise idea/datas of 
benefits and tradeoff of optimize vs merge ...

Pierre


-Message d'origine-
De : Marc SCHNEIDER [mailto:marc.schneide...@gmail.com]
Envoyé : vendredi 22 juillet 2011 15:45
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Hello,

Pierre, can you tell us where you read that?
"I've read here that optimization is not always a requirement to have an
efficient index, due to some low level changes in lucene 3.xx"

Marc.

On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSEwrote:


Solr will response for search during optimization, but commits will have to
wait the end of the optimization process.

During optimization a new index is generated on disk by merging every
single file of the current index into one big file, so you're server will be
busy, especially regarding disk access. This may alter your response time
and has very negative effect on the replication of index if you have a
master/slave architecture.

I've read here that optimization is not always a requirement to have an
efficient index, due to some low level changes in lucene 3.xx, so maybe you
don't really need optimization. What version of solr are you using ? Maybe
someone can point toward a relevant link about optimization other than solr
wiki
http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

Pierre


-Message d'origine-
De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
Envoyé : vendredi 22 juillet 2011 12:45
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Thanks for clarity.

One more thing I want to know about optimization.

Right now I am planning to optimize the server in 24 hour. Optimization is
also time taking ( last time took around 13 minutes), so I want to know
that
:

1. when optimization is under process that time will solr server response
or
not?
2. if server will not response then how to do optimization of server fast
or
othe

Re: Updating fields in an existing document

2011-07-25 Thread Grant Ingersoll
This is a pretty low level issue with inverted indexes (i.e. the underlying 
data structure used) and not so much the architecture.  It is possible, I 
suppose, to solve it at the architectural level, but in many cases this causes 
performance problems that are not usually acceptable.

On Jul 20, 2011, at 7:08 PM, Jonathan Rochkind wrote:

> Nope, you're not missing anything, there's no way to alter a document in an 
> index but reindexing the whole document. Solr's architecture would make it 
> difficult (although never say impossible) to do otherwise. But you're right 
> it would be convenient for people other than you. 
> 
> Reindexing a single document ought not to be slow, although if you have many 
> of them at once it could be, or if you end up needing to very frequently 
> commit to an index it can indeed cause problems. 
> 
> From: Benson Margulies [bimargul...@gmail.com]
> Sent: Wednesday, July 20, 2011 6:05 PM
> To: solr-user
> Subject: Updating fields in an existing document
> 
> We find ourselves in the following quandry:
> 
> At initial index time, we store a value in a field, and we use it for
> facetting. So it, seemingly, has to be there as a field.
> 
> However, from time to time, something happens that causes us to want
> to change this value. As far as we know, this requires us to
> completely re-index the document, which is slow.
> 
> It struck me that we can't be the only people to go down this road, so
> I write to inquire if we are missing something.

--
Grant Ingersoll





Re: please help explaining debug output

2011-07-25 Thread Erick Erickson
Hmmm, I can't find a convenient 1.4.0 to download, but re-indexing is a good
idea since this seems like it *should* work.

Erick

On Mon, Jul 25, 2011 at 5:32 PM, Robert Petersen  wrote:
> I'm still on solr 1.4.0 and the analysis page looks like they should match, 
> and other products with the same content do in fact match.  I'm reindexing 
> the non-matching ones to rule that out.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, July 25, 2011 1:58 PM
> To: solr-user@lucene.apache.org
> Subject: Re: please help explaining debug output
>
> Hmmm, I'm assuming that moreWords is your default text field, yes?
>
> But it works for me (tm), using 1.4.1. What version of Solr are you on?
>
> Also, take a glance at the admin/analysis page, that might help...
>
> Gotta run
>
> Erick
>
> On Mon, Jul 25, 2011 at 4:52 PM, Robert Petersen  wrote:
>> Sorry, to clarify a search for P1102W matches all three docs but a
>> search for p1102w LaserJet only matches the second two.  Someone asked
>> me a question while I was typing and I got distracted, apologies for any
>> confusion.
>>
>> -Original Message-
>> From: Robert Petersen [mailto:rober...@buy.com]
>> Sent: Monday, July 25, 2011 1:42 PM
>> To: solr-user@lucene.apache.org
>> Subject: please help explaining debug output
>>
>> I have three documents with the following product titles in a text field
>> called moreWords with analysis stack matching the solr example text
>> field definition.
>>
>>
>>
>> 1.       HP LaserJet P1102W Monochrome Laser Printer
>> > oc/101/213824965.html>
>>
>> 2.       HP CE285A (85A) Remanufactured Black Toner Cartridge for
>> LaserJet M1212nf, P1102, P1102W Series
>> > dge-for-laserjet/q/loc/101/217145536.html>
>>
>> 3.       Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet
>> M1130, LaserJet M1132, LaserJet M1210
>> > 102w-laserjet-m1130/q/loc/101/222045267.html>
>>
>>
>>
>> A search for P1102W matches (2) and (3), but not (1) above.  Can someone
>> explain the debug output?  It looks like I am getting a non-match on (1)
>> because term frequency is zero?  Am I reading that right?  If so, how
>> could that be? the searched terms are equivalently in all three docs.  I
>> don't get it.
>>
>>
>>
>>
>>
>> 
>>
>> p1102w LaserJet 
>>
>> p1102w LaserJet 
>>
>> +PhraseQuery(moreWords:"p 1102 w")
>> +PhraseQuery(moreWords:"laser jet")
>>
>> +moreWords:"p 1102 w" +moreWords:"laser
>> jet"
>>
>> 
>>
>> 
>>
>> 3.64852 = (MATCH) sum of:
>>
>>  2.4758534 = weight(moreWords:"p 1102 w" in 6667236), product of:
>>
>>    0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:
>>
>>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>>
>>      0.041507367 = queryNorm
>>
>>    3.1121879 = fieldWeight(moreWords:"p 1102 w" in 6667236), product
>> of:
>>
>>      1.7320508 = tf(phraseFreq=3.0)
>>
>>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>>
>>      0.09375 = fieldNorm(field=moreWords, doc=6667236)
>>
>>  1.1726664 = weight(moreWords:"laser jet" in 6667236), product of:
>>
>>    0.60590804 = queryWeight(moreWords:"laser jet"), product of:
>>
>>      14.597603 = idf(moreWords: laser=26731 jet=12685)
>>
>>      0.041507367 = queryNorm
>>
>>    1.9353869 = fieldWeight(moreWords:"laser jet" in 6667236), product
>> of:
>>
>>      1.4142135 = tf(phraseFreq=2.0)
>>
>>      14.597603 = idf(moreWords: laser=26731 jet=12685)
>>
>>      0.09375 = fieldNorm(field=moreWords, doc=6667236)
>>
>>
>>
>> 
>>
>> 
>>
>> 2.8656518 = (MATCH) sum of:
>>
>>  1.4294347 = weight(moreWords:"p 1102 w" in 6684158), product of:
>>
>>    0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:
>>
>>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>>
>>      0.041507367 = queryNorm
>>
>>    1.7968225 = fieldWeight(moreWords:"p 1102 w" in 6684158), product
>> of:
>>
>>      1.0 = tf(phraseFreq=1.0)
>>
>>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>>
>>      0.09375 = fieldNorm(field=moreWords, doc=6684158)
>>
>>  1.4362172 = weight(moreWords:"laser jet" in 6684158), product of:
>>
>>    0.60590804 = queryWeight(moreWords:"laser jet"), product of:
>>
>>      14.597603 = idf(moreWords: laser=26731 jet=12685)
>>
>>      0.041507367 = queryNorm
>>
>>    2.3703551 = fieldWeight(moreWords:"laser jet" in 6684158), product
>> of:
>>
>>      1.7320508 = tf(phraseFreq=3.0)
>>
>>      14.597603 = idf(moreWords: laser=26731 jet=12685)
>>
>>      0.09375 = fieldNorm(field=moreWords, doc=6684158)
>>
>>
>>
>> 
>>
>> 
>>
>> sku:213824965
>>
>> 
>>
>> 
>>
>> 
>>
>> 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited
>> clause(s)
>>
>>  1.1911955 = weight(moreWords:"p 1102 w" in 32497), product of:
>>
>>    0.7955347 = queryWeight(moreWords:

Re: Updating fields in an existing document

2011-07-25 Thread Benson Margulies
As in http://wiki.apache.org/solr/UpdateXmlMessages?

On Mon, Jul 25, 2011 at 4:10 PM, Chris Hostetter
 wrote:
> : A followup. The wiki has a whole discussion of the 'update' XML
> : message. But solrj has nothing like it. Does that really exist? Is
> : there a reason to use it? If I just 'add' the document a second time,
> : it will replace?
>
> You should only see "update" in Solr docs used in the context of
> "updating" the index by adding (which might be replacing) or deleting
> documents.  (you'll note there is no "" tag or anything like that
> in the XML syntax)
>
>
> -Hoss
>


RE: please help explaining debug output

2011-07-25 Thread Robert Petersen
I'm still on solr 1.4.0 and the analysis page looks like they should match, and 
other products with the same content do in fact match.  I'm reindexing the 
non-matching ones to rule that out.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, July 25, 2011 1:58 PM
To: solr-user@lucene.apache.org
Subject: Re: please help explaining debug output

Hmmm, I'm assuming that moreWords is your default text field, yes?

But it works for me (tm), using 1.4.1. What version of Solr are you on?

Also, take a glance at the admin/analysis page, that might help...

Gotta run

Erick

On Mon, Jul 25, 2011 at 4:52 PM, Robert Petersen  wrote:
> Sorry, to clarify a search for P1102W matches all three docs but a
> search for p1102w LaserJet only matches the second two.  Someone asked
> me a question while I was typing and I got distracted, apologies for any
> confusion.
>
> -Original Message-
> From: Robert Petersen [mailto:rober...@buy.com]
> Sent: Monday, July 25, 2011 1:42 PM
> To: solr-user@lucene.apache.org
> Subject: please help explaining debug output
>
> I have three documents with the following product titles in a text field
> called moreWords with analysis stack matching the solr example text
> field definition.
>
>
>
> 1.       HP LaserJet P1102W Monochrome Laser Printer
>  oc/101/213824965.html>
>
> 2.       HP CE285A (85A) Remanufactured Black Toner Cartridge for
> LaserJet M1212nf, P1102, P1102W Series
>  dge-for-laserjet/q/loc/101/217145536.html>
>
> 3.       Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet
> M1130, LaserJet M1132, LaserJet M1210
>  102w-laserjet-m1130/q/loc/101/222045267.html>
>
>
>
> A search for P1102W matches (2) and (3), but not (1) above.  Can someone
> explain the debug output?  It looks like I am getting a non-match on (1)
> because term frequency is zero?  Am I reading that right?  If so, how
> could that be? the searched terms are equivalently in all three docs.  I
> don't get it.
>
>
>
>
>
> 
>
> p1102w LaserJet 
>
> p1102w LaserJet 
>
> +PhraseQuery(moreWords:"p 1102 w")
> +PhraseQuery(moreWords:"laser jet")
>
> +moreWords:"p 1102 w" +moreWords:"laser
> jet"
>
> 
>
> 
>
> 3.64852 = (MATCH) sum of:
>
>  2.4758534 = weight(moreWords:"p 1102 w" in 6667236), product of:
>
>    0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:
>
>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>
>      0.041507367 = queryNorm
>
>    3.1121879 = fieldWeight(moreWords:"p 1102 w" in 6667236), product
> of:
>
>      1.7320508 = tf(phraseFreq=3.0)
>
>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>
>      0.09375 = fieldNorm(field=moreWords, doc=6667236)
>
>  1.1726664 = weight(moreWords:"laser jet" in 6667236), product of:
>
>    0.60590804 = queryWeight(moreWords:"laser jet"), product of:
>
>      14.597603 = idf(moreWords: laser=26731 jet=12685)
>
>      0.041507367 = queryNorm
>
>    1.9353869 = fieldWeight(moreWords:"laser jet" in 6667236), product
> of:
>
>      1.4142135 = tf(phraseFreq=2.0)
>
>      14.597603 = idf(moreWords: laser=26731 jet=12685)
>
>      0.09375 = fieldNorm(field=moreWords, doc=6667236)
>
>
>
> 
>
> 
>
> 2.8656518 = (MATCH) sum of:
>
>  1.4294347 = weight(moreWords:"p 1102 w" in 6684158), product of:
>
>    0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:
>
>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>
>      0.041507367 = queryNorm
>
>    1.7968225 = fieldWeight(moreWords:"p 1102 w" in 6684158), product
> of:
>
>      1.0 = tf(phraseFreq=1.0)
>
>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>
>      0.09375 = fieldNorm(field=moreWords, doc=6684158)
>
>  1.4362172 = weight(moreWords:"laser jet" in 6684158), product of:
>
>    0.60590804 = queryWeight(moreWords:"laser jet"), product of:
>
>      14.597603 = idf(moreWords: laser=26731 jet=12685)
>
>      0.041507367 = queryNorm
>
>    2.3703551 = fieldWeight(moreWords:"laser jet" in 6684158), product
> of:
>
>      1.7320508 = tf(phraseFreq=3.0)
>
>      14.597603 = idf(moreWords: laser=26731 jet=12685)
>
>      0.09375 = fieldNorm(field=moreWords, doc=6684158)
>
>
>
> 
>
> 
>
> sku:213824965
>
> 
>
> 
>
> 
>
> 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited
> clause(s)
>
>  1.1911955 = weight(moreWords:"p 1102 w" in 32497), product of:
>
>    0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:
>
>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>
>      0.041507367 = queryNorm
>
>    1.4973521 = fieldWeight(moreWords:"p 1102 w" in 32497), product of:
>
>      1.0 = tf(phraseFreq=1.0)
>
>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>
>      0.078125 = fieldNorm(field=moreWords, doc=32497)
>
>  0.0 = no match on 

Re: multivalue or denormalise

2011-07-25 Thread abhayd
hi erick,

I will be searching only on search_term.

I did exactly as u said in application layer,

I was not sure how multi-valued fields works in co-relation



--
View this message in context: 
http://lucene.472066.n3.nabble.com/multivalue-or-denormalise-tp3197942p3198710.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: please help explaining debug output

2011-07-25 Thread Erick Erickson
Hmmm, I'm assuming that moreWords is your default text field, yes?

But it works for me (tm), using 1.4.1. What version of Solr are you on?

Also, take a glance at the admin/analysis page, that might help...

Gotta run

Erick

On Mon, Jul 25, 2011 at 4:52 PM, Robert Petersen  wrote:
> Sorry, to clarify a search for P1102W matches all three docs but a
> search for p1102w LaserJet only matches the second two.  Someone asked
> me a question while I was typing and I got distracted, apologies for any
> confusion.
>
> -Original Message-
> From: Robert Petersen [mailto:rober...@buy.com]
> Sent: Monday, July 25, 2011 1:42 PM
> To: solr-user@lucene.apache.org
> Subject: please help explaining debug output
>
> I have three documents with the following product titles in a text field
> called moreWords with analysis stack matching the solr example text
> field definition.
>
>
>
> 1.       HP LaserJet P1102W Monochrome Laser Printer
>  oc/101/213824965.html>
>
> 2.       HP CE285A (85A) Remanufactured Black Toner Cartridge for
> LaserJet M1212nf, P1102, P1102W Series
>  dge-for-laserjet/q/loc/101/217145536.html>
>
> 3.       Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet
> M1130, LaserJet M1132, LaserJet M1210
>  102w-laserjet-m1130/q/loc/101/222045267.html>
>
>
>
> A search for P1102W matches (2) and (3), but not (1) above.  Can someone
> explain the debug output?  It looks like I am getting a non-match on (1)
> because term frequency is zero?  Am I reading that right?  If so, how
> could that be? the searched terms are equivalently in all three docs.  I
> don't get it.
>
>
>
>
>
> 
>
> p1102w LaserJet 
>
> p1102w LaserJet 
>
> +PhraseQuery(moreWords:"p 1102 w")
> +PhraseQuery(moreWords:"laser jet")
>
> +moreWords:"p 1102 w" +moreWords:"laser
> jet"
>
> 
>
> 
>
> 3.64852 = (MATCH) sum of:
>
>  2.4758534 = weight(moreWords:"p 1102 w" in 6667236), product of:
>
>    0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:
>
>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>
>      0.041507367 = queryNorm
>
>    3.1121879 = fieldWeight(moreWords:"p 1102 w" in 6667236), product
> of:
>
>      1.7320508 = tf(phraseFreq=3.0)
>
>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>
>      0.09375 = fieldNorm(field=moreWords, doc=6667236)
>
>  1.1726664 = weight(moreWords:"laser jet" in 6667236), product of:
>
>    0.60590804 = queryWeight(moreWords:"laser jet"), product of:
>
>      14.597603 = idf(moreWords: laser=26731 jet=12685)
>
>      0.041507367 = queryNorm
>
>    1.9353869 = fieldWeight(moreWords:"laser jet" in 6667236), product
> of:
>
>      1.4142135 = tf(phraseFreq=2.0)
>
>      14.597603 = idf(moreWords: laser=26731 jet=12685)
>
>      0.09375 = fieldNorm(field=moreWords, doc=6667236)
>
>
>
> 
>
> 
>
> 2.8656518 = (MATCH) sum of:
>
>  1.4294347 = weight(moreWords:"p 1102 w" in 6684158), product of:
>
>    0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:
>
>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>
>      0.041507367 = queryNorm
>
>    1.7968225 = fieldWeight(moreWords:"p 1102 w" in 6684158), product
> of:
>
>      1.0 = tf(phraseFreq=1.0)
>
>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>
>      0.09375 = fieldNorm(field=moreWords, doc=6684158)
>
>  1.4362172 = weight(moreWords:"laser jet" in 6684158), product of:
>
>    0.60590804 = queryWeight(moreWords:"laser jet"), product of:
>
>      14.597603 = idf(moreWords: laser=26731 jet=12685)
>
>      0.041507367 = queryNorm
>
>    2.3703551 = fieldWeight(moreWords:"laser jet" in 6684158), product
> of:
>
>      1.7320508 = tf(phraseFreq=3.0)
>
>      14.597603 = idf(moreWords: laser=26731 jet=12685)
>
>      0.09375 = fieldNorm(field=moreWords, doc=6684158)
>
>
>
> 
>
> 
>
> sku:213824965
>
> 
>
> 
>
> 
>
> 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited
> clause(s)
>
>  1.1911955 = weight(moreWords:"p 1102 w" in 32497), product of:
>
>    0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:
>
>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>
>      0.041507367 = queryNorm
>
>    1.4973521 = fieldWeight(moreWords:"p 1102 w" in 32497), product of:
>
>      1.0 = tf(phraseFreq=1.0)
>
>      19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)
>
>      0.078125 = fieldNorm(field=moreWords, doc=32497)
>
>  0.0 = no match on required clause (moreWords:"laser jet")
>
>    0.0 = weight(moreWords:"laser jet" in 32497), product of:
>
>      0.60590804 = queryWeight(moreWords:"laser jet"), product of:
>
>        14.597603 = idf(moreWords: laser=26731 jet=12685)
>
>        0.041507367 = queryNorm
>
>      0.0 = fieldWeight(moreWords:"laser jet" in 32497), product of:
>
>        0.0 = tf(phraseFreq=0.0)
>
>        14.5

Re: dih fetching but not adding records to index

2011-07-25 Thread abhayd
thanks!! it worked.

I was just wondering if xpath can be used to use process default xml format
for solr index doc



--
View this message in context: 
http://lucene.472066.n3.nabble.com/dih-fetching-but-not-adding-records-to-index-tp3189438p3198705.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: please help explaining debug output

2011-07-25 Thread Robert Petersen
Sorry, to clarify a search for P1102W matches all three docs but a
search for p1102w LaserJet only matches the second two.  Someone asked
me a question while I was typing and I got distracted, apologies for any
confusion.

-Original Message-
From: Robert Petersen [mailto:rober...@buy.com] 
Sent: Monday, July 25, 2011 1:42 PM
To: solr-user@lucene.apache.org
Subject: please help explaining debug output

I have three documents with the following product titles in a text field
called moreWords with analysis stack matching the solr example text
field definition.

 

1.   HP LaserJet P1102W Monochrome Laser Printer
 

2.   HP CE285A (85A) Remanufactured Black Toner Cartridge for
LaserJet M1212nf, P1102, P1102W Series
 

3.   Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet
M1130, LaserJet M1132, LaserJet M1210
 

 

A search for P1102W matches (2) and (3), but not (1) above.  Can someone
explain the debug output?  It looks like I am getting a non-match on (1)
because term frequency is zero?  Am I reading that right?  If so, how
could that be? the searched terms are equivalently in all three docs.  I
don't get it.

 

 



p1102w LaserJet 

p1102w LaserJet 

+PhraseQuery(moreWords:"p 1102 w")
+PhraseQuery(moreWords:"laser jet")

+moreWords:"p 1102 w" +moreWords:"laser
jet"





3.64852 = (MATCH) sum of:

  2.4758534 = weight(moreWords:"p 1102 w" in 6667236), product of:

0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:

  19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)

  0.041507367 = queryNorm

3.1121879 = fieldWeight(moreWords:"p 1102 w" in 6667236), product
of:

  1.7320508 = tf(phraseFreq=3.0)

  19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)

  0.09375 = fieldNorm(field=moreWords, doc=6667236)

  1.1726664 = weight(moreWords:"laser jet" in 6667236), product of:

0.60590804 = queryWeight(moreWords:"laser jet"), product of:

  14.597603 = idf(moreWords: laser=26731 jet=12685)

  0.041507367 = queryNorm

1.9353869 = fieldWeight(moreWords:"laser jet" in 6667236), product
of:

  1.4142135 = tf(phraseFreq=2.0)

  14.597603 = idf(moreWords: laser=26731 jet=12685)

  0.09375 = fieldNorm(field=moreWords, doc=6667236)

 





2.8656518 = (MATCH) sum of:

  1.4294347 = weight(moreWords:"p 1102 w" in 6684158), product of:

0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:

  19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)

  0.041507367 = queryNorm

1.7968225 = fieldWeight(moreWords:"p 1102 w" in 6684158), product
of:

  1.0 = tf(phraseFreq=1.0)

  19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)

  0.09375 = fieldNorm(field=moreWords, doc=6684158)

  1.4362172 = weight(moreWords:"laser jet" in 6684158), product of:

0.60590804 = queryWeight(moreWords:"laser jet"), product of:

  14.597603 = idf(moreWords: laser=26731 jet=12685)

  0.041507367 = queryNorm

2.3703551 = fieldWeight(moreWords:"laser jet" in 6684158), product
of:

  1.7320508 = tf(phraseFreq=3.0)

  14.597603 = idf(moreWords: laser=26731 jet=12685)

  0.09375 = fieldNorm(field=moreWords, doc=6684158)

 





sku:213824965







0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited
clause(s)

  1.1911955 = weight(moreWords:"p 1102 w" in 32497), product of:

0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:

  19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)

  0.041507367 = queryNorm

1.4973521 = fieldWeight(moreWords:"p 1102 w" in 32497), product of:

  1.0 = tf(phraseFreq=1.0)

  19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)

  0.078125 = fieldNorm(field=moreWords, doc=32497)

  0.0 = no match on required clause (moreWords:"laser jet")

0.0 = weight(moreWords:"laser jet" in 32497), product of:

  0.60590804 = queryWeight(moreWords:"laser jet"), product of:

14.597603 = idf(moreWords: laser=26731 jet=12685)

0.041507367 = queryNorm

  0.0 = fieldWeight(moreWords:"laser jet" in 32497), product of:

0.0 = tf(phraseFreq=0.0)

14.597603 = idf(moreWords: laser=26731 jet=12685)

0.078125 = fieldNorm(field=moreWords, doc=32497)

 







please help explaining debug output

2011-07-25 Thread Robert Petersen
I have three documents with the following product titles in a text field
called moreWords with analysis stack matching the solr example text
field definition.

 

1.   HP LaserJet P1102W Monochrome Laser Printer
 

2.   HP CE285A (85A) Remanufactured Black Toner Cartridge for
LaserJet M1212nf, P1102, P1102W Series
 

3.   Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet
M1130, LaserJet M1132, LaserJet M1210
 

 

A search for P1102W matches (2) and (3), but not (1) above.  Can someone
explain the debug output?  It looks like I am getting a non-match on (1)
because term frequency is zero?  Am I reading that right?  If so, how
could that be? the searched terms are equivalently in all three docs.  I
don't get it.

 

 



p1102w LaserJet 

p1102w LaserJet 

+PhraseQuery(moreWords:"p 1102 w")
+PhraseQuery(moreWords:"laser jet")

+moreWords:"p 1102 w" +moreWords:"laser
jet"





3.64852 = (MATCH) sum of:

  2.4758534 = weight(moreWords:"p 1102 w" in 6667236), product of:

0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:

  19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)

  0.041507367 = queryNorm

3.1121879 = fieldWeight(moreWords:"p 1102 w" in 6667236), product
of:

  1.7320508 = tf(phraseFreq=3.0)

  19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)

  0.09375 = fieldNorm(field=moreWords, doc=6667236)

  1.1726664 = weight(moreWords:"laser jet" in 6667236), product of:

0.60590804 = queryWeight(moreWords:"laser jet"), product of:

  14.597603 = idf(moreWords: laser=26731 jet=12685)

  0.041507367 = queryNorm

1.9353869 = fieldWeight(moreWords:"laser jet" in 6667236), product
of:

  1.4142135 = tf(phraseFreq=2.0)

  14.597603 = idf(moreWords: laser=26731 jet=12685)

  0.09375 = fieldNorm(field=moreWords, doc=6667236)

 





2.8656518 = (MATCH) sum of:

  1.4294347 = weight(moreWords:"p 1102 w" in 6684158), product of:

0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:

  19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)

  0.041507367 = queryNorm

1.7968225 = fieldWeight(moreWords:"p 1102 w" in 6684158), product
of:

  1.0 = tf(phraseFreq=1.0)

  19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)

  0.09375 = fieldNorm(field=moreWords, doc=6684158)

  1.4362172 = weight(moreWords:"laser jet" in 6684158), product of:

0.60590804 = queryWeight(moreWords:"laser jet"), product of:

  14.597603 = idf(moreWords: laser=26731 jet=12685)

  0.041507367 = queryNorm

2.3703551 = fieldWeight(moreWords:"laser jet" in 6684158), product
of:

  1.7320508 = tf(phraseFreq=3.0)

  14.597603 = idf(moreWords: laser=26731 jet=12685)

  0.09375 = fieldNorm(field=moreWords, doc=6684158)

 





sku:213824965







0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited
clause(s)

  1.1911955 = weight(moreWords:"p 1102 w" in 32497), product of:

0.7955347 = queryWeight(moreWords:"p 1102 w"), product of:

  19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)

  0.041507367 = queryNorm

1.4973521 = fieldWeight(moreWords:"p 1102 w" in 32497), product of:

  1.0 = tf(phraseFreq=1.0)

  19.166107 = idf(moreWords: p=189166 1102=1135 w=445720)

  0.078125 = fieldNorm(field=moreWords, doc=32497)

  0.0 = no match on required clause (moreWords:"laser jet")

0.0 = weight(moreWords:"laser jet" in 32497), product of:

  0.60590804 = queryWeight(moreWords:"laser jet"), product of:

14.597603 = idf(moreWords: laser=26731 jet=12685)

0.041507367 = queryNorm

  0.0 = fieldWeight(moreWords:"laser jet" in 32497), product of:

0.0 = tf(phraseFreq=0.0)

14.597603 = idf(moreWords: laser=26731 jet=12685)

0.078125 = fieldNorm(field=moreWords, doc=32497)

 







Re: Updating fields in an existing document

2011-07-25 Thread Chris Hostetter
: A followup. The wiki has a whole discussion of the 'update' XML
: message. But solrj has nothing like it. Does that really exist? Is
: there a reason to use it? If I just 'add' the document a second time,
: it will replace?

You should only see "update" in Solr docs used in the context of 
"updating" the index by adding (which might be replacing) or deleting 
documents.  (you'll note there is no "" tag or anything like that 
in the XML syntax) 


-Hoss


Re: multivalue or denormalise

2011-07-25 Thread Erick Erickson
I'm a little confused. Are you searching against these
different titles or is the search something else and you're really
only interested in displaying different titles for documents returned
for the query?

If it's just a display issue, you can use multivalued fields, the order
in which you put values in the fields is the order in which they're
returned and your application layer can decide whether to display
the title or not. This implies that you return all the titles and dates with
each document to the app...

If you're searching against titles and dates, that's a different case, you'd
be adding some clause like
AND ((title_1:yourtitle AND st_date_1:[* TO NOW] AND end_date_1:[NOW
TO *]) OR (same for title2) OR )

Best
Erick

On Mon, Jul 25, 2011 at 12:27 PM, abhayd  wrote:
> hi
>
>
> What i want to do is get title_1 if NOW is between st_date_1 and end_date_1
> Also at the same time   get title_2 if NOW is between st_date_2 and
> end_date_2
>
> and so on
>
> at present i have a schema like this denorm'.   I cant figure it out a
> single solr query to  do this.
>    omitNorms="true"/>
>    omitNorms="true" multiValued="true"/>
>   
>    />
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>
> Any help getting correct query?
>
> Also i thought of using multivalue filed like
> title , st_date, end_date, But I am not sure if two multivalue fields
> co-relate with each other. That is if today's date does not fall in first
> st_date and first end_date then i dont want to have first title in results.
> Is that possible ?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/multivalue-or-denormalise-tp3197942p3197942.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Ignore records that are missing a value in a field

2011-07-25 Thread Erick Erickson
Hmmm, I think that should be fq=field:[* TO *] since the problem is to
include docs with anything in that field.

You could also index a special "EMPTY_FIELD" string and do something
like fq=-field:[EMPTY_FIELD]..

Best
Erick

On Mon, Jul 25, 2011 at 12:10 PM, Stefan Matheis
 wrote:
> So, with other words you want to exlude those records? :)
> FilterQuery with a Range-Query could help: fq=-field:[* TO *]
>
> Regards
> Stefan
>
> Am 25.07.2011 17:53, schrieb Brian Lamb:
>>
>> Hi all,
>>
>> I have an optional field called "common_names". I would like to keep this
>> field optional but at the same, occasionally do a search where I do not
>> include results where there is no value set for this field. Is this
>> possible
>> to do within solr?
>>
>> In other words, I would like to do a search where if there is no value set
>> for common_names, I would not want that record included in the search
>> result.
>>
>> Thanks,
>>
>> Brian Lamb
>>
>


CoreAdminHandler: can I specify custom properties when creating cores?

2011-07-25 Thread Yury Kats
When crating cores through solr.xml, I am able to specify custom
properties, to be referenced in solrconfig.xml. For example:

 
   
 
   
   
 
 
   
 

This would create a master core and a slave core, participating in replication,
both sharing the same solrconfig.xml for replication setup.

Is there a way to specify such properties when creating cores through a 
CoreAdminHandler
request [1]?

Thanks,
Yury

[1] http://wiki.apache.org/solr/CoreAdmin#CREATE


Re: Wiki Error JSON syntax

2011-07-25 Thread Remy Loubradou
Hi,

2011/7/25 Gabriel Farrell 

> On Mon, Jul 25, 2011 at 12:24 PM, Stefan Matheis
>  wrote:
> > Hi Remy,
> >
> > so you may open an Issue for this on the github Project? i mean .. just
> > creating another client, because i have one problem, does not sound like
> a
> > good plan?
>
> Agreed, and thanks for calling my attention to this thread, Stefan.
>

Yes I'm agree too, I'm not used to, pull, submit issue it's new for me :). I
will publish an issue.

>
> > Regards
> > Stefan
> >
> > Am 25.07.2011 10:56, schrieb Remy Loubradou:
> >>
> >> Hey Stephan,
> >>
> >> Thanks, but I already used this solr client and I got an error when I
> add
> >> too much documents "FATAL ERROR: JS Allocation failed - process out of
> >> memory".
> >> I didn't find the source of the problem in the solr client. So I decided
> >> to
> >> write my own without this error hopefully and also I'm using JSON
> >> documents
> >> and not XML documents. I read a post saying that I can get better
> >> performance using JSON documents.
> >>
> >> I will release this client as an npm module.
>
> How many documents are you attempting to add at once when you get that
> error? Would it possible to chunk them into smaller groups?
>

I add a document and commit after. But it's a very XML document ~130MB and
that's happening only with big XML file.
I got this error FATAL ERROR: JS Allocation failed - process out of memory .


>
> I'm happy to work with you on enhancing node-solr to meet your needs.
> The only reason updates are via XML rather than JSON is that 3.1 was
> new or not yet released (don't quite remember which) when I first
> wrote node-solr. Even now I imagine many people may still be using a
> version of Solr that doesn't handle JSON updates. Maybe a flag
> parameter could be added to the Client object to switch from XML to
> JSON?
>

Yes, this could be a good solution. So we can try to merge your solr-client
with my client.
Before to release my client I want to have full test coverage with vows
(have some fun :) )

>
> The node-solr client has fairly complete test coverage, a history of
> commits from the Node community, and  versions aligning with several
> versions of Node. I would appreciate your contributions, either via
> issues or pull requests.
>

Cool. I will be pleased to do that!

>
> >> Regards,
> >> Remy
> >>
> >> 2011/7/25 Stefan Matheis
> >>
> >>> Remy,
> >>>
> >>> didn't use it myself .. but you know about
> https://github.com/gsf/node-**
> >>> solr  ?
> >>>
> >>> Regards
> >>> Stefan
> >>>
> >>> Am 20.07.2011 20:05, schrieb Remy Loubradou:
> >>>
> >>>  I think I can trust you but this is weird.
> 
>  Funny things if you try to validate on http://jsonlint.com/ this
> JSON,
>  duplicates keys are automatically removed. But the thing is, how can
> you
>  possibly generate this json with Javascript Object?
> 
>  It will be really nice to combine both ways that you show on the page.
>  Something like:
> 
>  {
>  "add": [
>  {
>  "doc": {
>  "id": "DOC1",
>  "my_boosted_field": {
>  "boost": 2.3,
>  "value": "test"
>  },
>  "my_multivalued_field": [
>  "aaa",
>  "bbb"
>  ]
>  }
>  },
>  {
>  "commitWithin": 5000,
>  "overwrite": false,
>  "boost": 3.45,
>  "doc": {
>  "f1": "v2"
>  }
>  }
>  ],
>  "commit": {},
>  "optimize": {
>  "waitFlush": false,
>  "waitSearcher": false
>  },
>  "delete": [
>  {
>  "id": "ID"
>  },
>  {
>  "query": "QUERY"
>  }
>  ]
>  }
> 
>  Thanks you for you previous response Yonik.
> 
>  2011/7/20 Yonik
>  Seeley
> >
> 
>   On Wed, Jul 20, 2011 at 12:16 PM, Remy Loubradou
> >
> >wrote:
> >
> >> Hi,
> >> I was writing a Solr Client API for Node and I found an error on
> this
> >>
> > page
> >
> >>
> >> http://wiki.apache.org/solr/**UpdateJSON<
> http://wiki.apache.org/solr/UpdateJSON>,on
> >> the section "Update Commands"
> >>
> > the
> >
> >> JSON is not valid because there are duplicate keys and two times
> with
> >>
> > "add"
> >
> >> and "delete".
> >>
> >
> > It's a common misconception that it's invalid JSON.  Duplicate keys
> > are in fact legal.
>
> I can't resist addressing this side conversation.
>
> While I understand the desire for a straightforward mapping between
> the XML and JSON update formats, I think the use of duplicate keys is
> a bad idea. As noted in t

Re: Wiki Error JSON syntax

2011-07-25 Thread Gabriel Farrell
On Mon, Jul 25, 2011 at 12:24 PM, Stefan Matheis
 wrote:
> Hi Remy,
>
> so you may open an Issue for this on the github Project? i mean .. just
> creating another client, because i have one problem, does not sound like a
> good plan?

Agreed, and thanks for calling my attention to this thread, Stefan.

> Regards
> Stefan
>
> Am 25.07.2011 10:56, schrieb Remy Loubradou:
>>
>> Hey Stephan,
>>
>> Thanks, but I already used this solr client and I got an error when I add
>> too much documents "FATAL ERROR: JS Allocation failed - process out of
>> memory".
>> I didn't find the source of the problem in the solr client. So I decided
>> to
>> write my own without this error hopefully and also I'm using JSON
>> documents
>> and not XML documents. I read a post saying that I can get better
>> performance using JSON documents.
>>
>> I will release this client as an npm module.

How many documents are you attempting to add at once when you get that
error? Would it possible to chunk them into smaller groups?

I'm happy to work with you on enhancing node-solr to meet your needs.
The only reason updates are via XML rather than JSON is that 3.1 was
new or not yet released (don't quite remember which) when I first
wrote node-solr. Even now I imagine many people may still be using a
version of Solr that doesn't handle JSON updates. Maybe a flag
parameter could be added to the Client object to switch from XML to
JSON?

The node-solr client has fairly complete test coverage, a history of
commits from the Node community, and  versions aligning with several
versions of Node. I would appreciate your contributions, either via
issues or pull requests.

>> Regards,
>> Remy
>>
>> 2011/7/25 Stefan Matheis
>>
>>> Remy,
>>>
>>> didn't use it myself .. but you know about https://github.com/gsf/node-**
>>> solr  ?
>>>
>>> Regards
>>> Stefan
>>>
>>> Am 20.07.2011 20:05, schrieb Remy Loubradou:
>>>
>>>  I think I can trust you but this is weird.

 Funny things if you try to validate on http://jsonlint.com/ this JSON,
 duplicates keys are automatically removed. But the thing is, how can you
 possibly generate this json with Javascript Object?

 It will be really nice to combine both ways that you show on the page.
 Something like:

 {
     "add": [
         {
             "doc": {
                 "id": "DOC1",
                 "my_boosted_field": {
                     "boost": 2.3,
                     "value": "test"
                 },
                 "my_multivalued_field": [
                     "aaa",
                     "bbb"
                 ]
             }
         },
         {
             "commitWithin": 5000,
             "overwrite": false,
             "boost": 3.45,
             "doc": {
                 "f1": "v2"
             }
         }
     ],
     "commit": {},
     "optimize": {
         "waitFlush": false,
         "waitSearcher": false
     },
     "delete": [
         {
             "id": "ID"
         },
         {
             "query": "QUERY"
         }
     ]
 }

 Thanks you for you previous response Yonik.

 2011/7/20 Yonik
 Seeley
>

  On Wed, Jul 20, 2011 at 12:16 PM, Remy Loubradou
>
>    wrote:
>
>> Hi,
>> I was writing a Solr Client API for Node and I found an error on this
>>
> page
>
>>
>> http://wiki.apache.org/solr/**UpdateJSON,on
>> the section "Update Commands"
>>
> the
>
>> JSON is not valid because there are duplicate keys and two times with
>>
> "add"
>
>> and "delete".
>>
>
> It's a common misconception that it's invalid JSON.  Duplicate keys
> are in fact legal.

I can't resist addressing this side conversation.

While I understand the desire for a straightforward mapping between
the XML and JSON update formats, I think the use of duplicate keys is
a bad idea. As noted in the spec
(http://www.ietf.org/rfc/rfc4627.txt), "The names within an object
SHOULD be unique." I'm not sure the reasons here justify ignoring that
recommendation.

The fact that you need to keep reminding people that duplicate names
are legal is a sign that it's more trouble than it's worth. Also, most
JSON parsers just punt on duplicate names (see the third paragraph in
"A word about design" at
http://planet.plt-scheme.org/package-source/dherman/json.plt/3/0/planet-docs/json/index.html
for one take on the situation). I really don't want to write a new
JavaScript JSON parser just for node-solr.


Re: Getting a wierd Class Not Found Exception: SolrParams

2011-07-25 Thread Sowmya V.B.
Hi Eric

Yes, it was a classpath issue.

Sowmya.

On Mon, Jul 25, 2011 at 4:01 PM, Erick Erickson wrote:

> Well, MultiMapSolrParams is a subclass of SolrParams, so you actually
> do use it in your code 
>
> But this looks like a classpath problem. You say your code compiles,
> but do you make all the jars you path to during compilation available
> to your servlet? And/or do you have any old jar files in your classpath?
>
> Best
> Erick
>
> On Thu, Jul 21, 2011 at 3:00 AM, Sowmya V.B.  wrote:
> > Hi All
> >
> > I have been getting this wierd error since yday evening, whose cause I am
> > not able to figure out.
> > I made a webinterface to read and display Solr Results, which is a
> servlet
> > that calls Solr Servlet.
> > I am
> >
> > I give the query to Solr, using:
> > MultiMapSolrParams solrparamsmini =
> > SolrRequestParsers.parseQueryString(queryrequest.toString());
> > -where queryrequest contains all the ingredients of a Solr query.
> >
> > Eg:   StringBuffer queryrequest = new StringBuffer();
> >queryrequest.append("&q=" + query);
> >
> >
> queryrequest.append("&start=0&rows=30&hl=true&hl.fl=text&hl.frag=500&defType=dismax");
> >
> >
> queryrequest.append("&bq="+Field1+":["+frompercent+"%20TO%20"+topercent+"]");
> >
> > It compiles and builds without errors, but I get this error
> > "java.lang.ClassNotFoundException:
> > org.apache.solr.common.params.SolrParams", when I run the app.
> > But, I dont use SolrParams class anywhere in my code!
> >
> > Here is the stack trace:
> > INFO: Server startup in 1953 ms
> > Jul 21, 2011 8:52:20 AM org.apache.catalina.core.ApplicationContext log
> > INFO: Marking servlet solrsearch as unavailable
> > Jul 21, 2011 8:52:20 AM org.apache.catalina.core.StandardWrapperValve
> invoke
> > SEVERE: Allocate exception for servlet solrsearch
> > java.lang.ClassNotFoundException:
> org.apache.solr.common.params.SolrParams
> >at
> >
> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1676)
> >at
> >
> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1521)
> >at java.lang.Class.getDeclaredConstructors0(Native Method)
> >at java.lang.Class.privateGetDeclaredConstructors(Class.java:2389)
> >at java.lang.Class.getConstructor0(Class.java:2699)
> >at java.lang.Class.newInstance0(Class.java:326)
> >at java.lang.Class.newInstance(Class.java:308)
> >at
> >
> org.apache.catalina.core.DefaultInstanceManager.newInstance(DefaultInstanceManager.java:119)
> >at
> >
> org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1062)
> >at
> >
> org.apache.catalina.core.StandardWrapper.allocate(StandardWrapper.java:813)
> >at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:135)
> >at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164)
> >at
> >
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462)
> >at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
> >at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
> >at
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:562)
> >at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> >at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:395)
> >at
> >
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:250)
> >at
> >
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188)
> >at
> >
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:166)
> >at
> >
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302)
> >at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >at java.lang.Thread.run(Thread.java:680)
> >
> >
> > Anyone had this kind of issue before?
> > --
> > Sowmya V.B.
> > 
> > Losing optimism is blasphemy!
> > http://vbsowmya.wordpress.com
> > 
> >
>



-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com



Re: strip html from data

2011-07-25 Thread Mike Sokolov

Hmm that looks like it's working fine.  I stand corrected.


On 07/25/2011 12:24 PM, Markus Jelsma wrote:

I've seen that issue too and read comments on the list yet i've never had
trouble with the order, don't know what's going on. Check this analyzer, i've
moved the charFilter to the bottom:













The analysis chain still does its job as i expect for the input:
bla bla

Index Analyzer
org.apache.solr.analysis.HTMLStripCharFilterFactory
{luceneMatchVersion=LUCENE_34}
textbla bla
org.apache.solr.analysis.WhitespaceTokenizerFactory
{luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
startOffset 6   10
endOffset   9   13
org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_34,
generateWordParts=1, catenateAll=0, catenateNumbers=1}
position1   2
term text   bla bla
startOffset 6   10
endOffset   9   13
typewordword
org.apache.solr.analysis.LowerCaseFilterFactory {luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
startOffset 6   10
endOffset   9   13
typewordword
org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
expand=true, ignoreCase=false, luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
typewordword
startOffset 6   10
endOffset   9   13
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=false, luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
typewordword
startOffset 6   10
endOffset   9   13
org.apache.solr.analysis.ASCIIFoldingFilterFactory
{luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
typewordword
startOffset 6   10
endOffset   9   13
org.apache.solr.analysis.SnowballPorterFilterFactory {protected=protwords.txt,
language=Dutch, luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
keyword false   false
typewordword
startOffset 6   10
endOffset   9   13
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory
{luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
keyword false   false
typewordword
startOffset 6   10
endOffset   9   13


On Monday 25 July 2011 18:07:29 Mike Sokolov wrote:
   

Hmm - I'm not sure about that; see
https://issues.apache.org/jira/browse/SOLR-2119

On 07/25/2011 12:01 PM, Markus Jelsma wrote:
 

charFilters are executed first regardless of their position in the
analyzer.

On Monday 25 July 2011 17:53:59 Mike Sokolov wrote:
   

I think you need to list the charfilter earlier in the analysis chain;
before the tokenizer.  Porbably Solr should tell you this...

-Mike

On 07/25/2011 09:03 AM, Merlin Morgenstern wrote:
 

sounds logical. I just changed it to the following, restarted and
reindexed

with commit:


   

   
   

   
   
   
   

   
   

   
   

   
   
   
   

   



Unfortunatelly that did not fix the error. There are stilltags
inside the data. Although I believe there are viewer then before but I
can not prove that. Fact is, there are still html tags inside the data.

Any other ideas what the problem could be?





2011/7/25 Markus Jelsma

   

You've three analyzer elements, i wonder what that would do. You need
to add
the char filter to the index-time analyzer.

On Monday 25 July 2011 13:09:14 Merlin Morgenstern wrote:
 

Hi there,

I am trying to strip html tags from the data before adding the
documents
   

to

 

the index. To do that I altered schem.xml like this:


   

   
   

   
   
   

   
   

   
   

   
   
   

   
   

   



   



   

   

   

Unfortunatelly this does not work, the hmtl tags likeare still
present after restarting and reindexing. I also tryed
htmlstriptransformer, but this did not work either.

Has anybody an idea how to get this done? Thank you in advance for
any hint.

Merlin
   

--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-853662

multivalue or denormalise

2011-07-25 Thread abhayd
hi


What i want to do is get title_1 if NOW is between st_date_1 and end_date_1 
Also at the same time   get title_2 if NOW is between st_date_2 and
end_date_2 

and so on

at present i have a schema like this denorm'.   I cant figure it out a
single solr query to  do this.
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

Any help getting correct query?

Also i thought of using multivalue filed like
title , st_date, end_date, But I am not sure if two multivalue fields
co-relate with each other. That is if today's date does not fall in first
st_date and first end_date then i dont want to have first title in results.
Is that possible ?  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/multivalue-or-denormalise-tp3197942p3197942.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Wiki Error JSON syntax

2011-07-25 Thread Stefan Matheis

Hi Remy,

so you may open an Issue for this on the github Project? i mean .. just 
creating another client, because i have one problem, does not sound like 
a good plan?


Regards
Stefan

Am 25.07.2011 10:56, schrieb Remy Loubradou:

Hey Stephan,

Thanks, but I already used this solr client and I got an error when I add
too much documents "FATAL ERROR: JS Allocation failed - process out of
memory".
I didn't find the source of the problem in the solr client. So I decided to
write my own without this error hopefully and also I'm using JSON documents
and not XML documents. I read a post saying that I can get better
performance using JSON documents.

I will release this client as an npm module.

Regards,
Remy

2011/7/25 Stefan Matheis


Remy,

didn't use it myself .. but you know about https://github.com/gsf/node-**
solr  ?

Regards
Stefan

Am 20.07.2011 20:05, schrieb Remy Loubradou:

  I think I can trust you but this is weird.

Funny things if you try to validate on http://jsonlint.com/ this JSON,
duplicates keys are automatically removed. But the thing is, how can you
possibly generate this json with Javascript Object?

It will be really nice to combine both ways that you show on the page.
Something like:

{
 "add": [
 {
 "doc": {
 "id": "DOC1",
 "my_boosted_field": {
 "boost": 2.3,
 "value": "test"
 },
 "my_multivalued_field": [
 "aaa",
 "bbb"
 ]
 }
 },
 {
 "commitWithin": 5000,
 "overwrite": false,
 "boost": 3.45,
 "doc": {
 "f1": "v2"
 }
 }
 ],
 "commit": {},
 "optimize": {
 "waitFlush": false,
 "waitSearcher": false
 },
 "delete": [
 {
 "id": "ID"
 },
 {
 "query": "QUERY"
 }
 ]
}

Thanks you for you previous response Yonik.

2011/7/20 Yonik Seeley




  On Wed, Jul 20, 2011 at 12:16 PM, Remy Loubradou

   wrote:


Hi,
I was writing a Solr Client API for Node and I found an error on this


page


http://wiki.apache.org/solr/**UpdateJSON,on the 
section "Update Commands"


the


JSON is not valid because there are duplicate keys and two times with


"add"


and "delete".



It's a common misconception that it's invalid JSON.  Duplicate keys
are in fact legal.

-Yonik
http://www.lucidimagination.**com

I tried with an array and it doesn't work as well, I got error


400, I think that's because the syntax is bad.

I don't really know if I am at the good place to talk about that but ...
that the only place I found. Sorry if it's not.

Thanks,

And I love Solr :)










Re: strip html from data

2011-07-25 Thread Markus Jelsma
I've seen that issue too and read comments on the list yet i've never had 
trouble with the order, don't know what's going on. Check this analyzer, i've 
moved the charFilter to the bottom:













The analysis chain still does its job as i expect for the input:
bla bla

Index Analyzer
org.apache.solr.analysis.HTMLStripCharFilterFactory 
{luceneMatchVersion=LUCENE_34}
textbla bla
org.apache.solr.analysis.WhitespaceTokenizerFactory 
{luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
startOffset 6   10
endOffset   9   13
org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1, 
generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_34, 
generateWordParts=1, catenateAll=0, catenateNumbers=1}
position1   2
term text   bla bla
startOffset 6   10
endOffset   9   13
typewordword
org.apache.solr.analysis.LowerCaseFilterFactory {luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
startOffset 6   10
endOffset   9   13
typewordword
org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, 
expand=true, ignoreCase=false, luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
typewordword
startOffset 6   10
endOffset   9   13
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, 
ignoreCase=false, luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
typewordword
startOffset 6   10
endOffset   9   13
org.apache.solr.analysis.ASCIIFoldingFilterFactory 
{luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
typewordword
startOffset 6   10
endOffset   9   13
org.apache.solr.analysis.SnowballPorterFilterFactory {protected=protwords.txt, 
language=Dutch, luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
keyword false   false
typewordword
startOffset 6   10
endOffset   9   13
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory 
{luceneMatchVersion=LUCENE_34}
position1   2
term text   bla bla
keyword false   false
typewordword
startOffset 6   10
endOffset   9   13


On Monday 25 July 2011 18:07:29 Mike Sokolov wrote:
> Hmm - I'm not sure about that; see
> https://issues.apache.org/jira/browse/SOLR-2119
> 
> On 07/25/2011 12:01 PM, Markus Jelsma wrote:
> > charFilters are executed first regardless of their position in the
> > analyzer.
> > 
> > On Monday 25 July 2011 17:53:59 Mike Sokolov wrote:
> >> I think you need to list the charfilter earlier in the analysis chain;
> >> before the tokenizer.  Porbably Solr should tell you this...
> >> 
> >> -Mike
> >> 
> >> On 07/25/2011 09:03 AM, Merlin Morgenstern wrote:
> >>> sounds logical. I just changed it to the following, restarted and
> >>> reindexed
> >>> 
> >>> with commit:
> >>> >>> 
> >>> positionIncrementGap="100" autoGeneratePhraseQueries="true">
> >>> 
> >>>   
> >>>   
> >>>>>>   class="solr.WhitespaceTokenizerFactory"/>
> >>>>>> 
> >>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >>> 
> >>>   
> >>>   
> >>>   
> >>>>>>   class="solr.HTMLStripCharFilterFactory"/>
> >>>   
> >>>   
> >>>   
> >>>   
> >>>>>>   class="solr.WhitespaceTokenizerFactory"/>
> >>>>>> 
> >>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >>> 
> >>>   
> >>>   
> >>>   
> >>>>>>   class="solr.HTMLStripCharFilterFactory"/>
> >>>   
> >>>   
> >>>
> >>>
> >>> 
> >>> Unfortunatelly that did not fix the error. There are still   tags
> >>> inside the data. Although I believe there are viewer then before but I
> >>> can not prove that. Fact is, there are still html tags inside the data.
> >>> 
> >>> Any other ideas what the problem could be?
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 2011/7/25 Markus Jelsma
> >>> 
>  You've three analyzer elements, i wonder what that would do. You need
>  to add
>  the char filter to the index-time analyzer.
>  
>  On Monday 25 July 2011 13:09:14 Merlin Morgenstern wrote:
> > Hi there,
> > 
> > I am trying to strip html tags from the data before adding the
> > documents
>  
> >>

Re: dih fetching but not adding records to index

2011-07-25 Thread Gora Mohanty
On Mon, Jul 25, 2011 at 9:36 PM, abhayd  wrote:
> hi
>
> thanks for the response
>
> I am aware of post.sh but i wanted to make use of dih and scheduling. We can
> not use cron due to some other issues.
>
> So was thinking of using scheduling Data import
[...]

OK, though in that case the  are superfluous
in the XML file. Also, DIH uses the tag itself in the XML
file rather than the "name" attribute. Thus, your XML should
look like:
--

   
   3
   
   
   4
   

-

Regards,
Gora


Re: Ignore records that are missing a value in a field

2011-07-25 Thread Stefan Matheis

So, with other words you want to exlude those records? :)
FilterQuery with a Range-Query could help: fq=-field:[* TO *]

Regards
Stefan

Am 25.07.2011 17:53, schrieb Brian Lamb:

Hi all,

I have an optional field called "common_names". I would like to keep this
field optional but at the same, occasionally do a search where I do not
include results where there is no value set for this field. Is this possible
to do within solr?

In other words, I would like to do a search where if there is no value set
for common_names, I would not want that record included in the search
result.

Thanks,

Brian Lamb



Re: strip html from data

2011-07-25 Thread Mike Sokolov
Hmm - I'm not sure about that; see 
https://issues.apache.org/jira/browse/SOLR-2119


On 07/25/2011 12:01 PM, Markus Jelsma wrote:

charFilters are executed first regardless of their position in the analyzer.

On Monday 25 July 2011 17:53:59 Mike Sokolov wrote:
   

I think you need to list the charfilter earlier in the analysis chain;
before the tokenizer.  Porbably Solr should tell you this...

-Mike

On 07/25/2011 09:03 AM, Merlin Morgenstern wrote:
 

sounds logical. I just changed it to the following, restarted and
reindexed

with commit:
   

  

  
  

  
  
  
  

  
  

  
  

  
  
  
  

  

   

Unfortunatelly that did not fix the error. There are still   tags
inside the data. Although I believe there are viewer then before but I
can not prove that. Fact is, there are still html tags inside the data.

Any other ideas what the problem could be?





2011/7/25 Markus Jelsma

   

You've three analyzer elements, i wonder what that would do. You need to
add
the char filter to the index-time analyzer.

On Monday 25 July 2011 13:09:14 Merlin Morgenstern wrote:
 

Hi there,

I am trying to strip html tags from the data before adding the
documents
   

to

 

the index. To do that I altered schem.xml like this:
   

  



  
  
  

  
  



  
  
  

  
  

  

   

  

   

  

  

  

Unfortunatelly this does not work, the hmtl tags like   are still
present after restarting and reindexing. I also tryed
htmlstriptransformer, but this did not work either.

Has anybody an idea how to get this done? Thank you in advance for any
hint.

Merlin
   

--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
 
   


Re: dih fetching but not adding records to index

2011-07-25 Thread abhayd
hi

thanks for the response

I am aware of post.sh but i wanted to make use of dih and scheduling. We can
not use cron due to some other issues.

So was thinking of using scheduling Data import 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/dih-fetching-but-not-adding-records-to-index-tp3189438p3197874.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: strip html from data

2011-07-25 Thread Markus Jelsma
charFilters are executed first regardless of their position in the analyzer.

On Monday 25 July 2011 17:53:59 Mike Sokolov wrote:
> I think you need to list the charfilter earlier in the analysis chain;
> before the tokenizer.  Porbably Solr should tell you this...
> 
> -Mike
> 
> On 07/25/2011 09:03 AM, Merlin Morgenstern wrote:
> > sounds logical. I just changed it to the following, restarted and
> > reindexed
> > 
> > with commit:
> >> 
> > positionIncrementGap="100" autoGeneratePhraseQueries="true">
> > 
> >  
> >  
> >  
> >   > 
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > 
> >  
> >  
> >  
> >   >  class="solr.HTMLStripCharFilterFactory"/>
> >  
> >  
> >  
> >  
> >  
> >   > 
> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> > 
> >  
> >  
> >  
> >   >  class="solr.HTMLStripCharFilterFactory"/>
> >  
> >  
> >   
> >   
> > 
> > Unfortunatelly that did not fix the error. There are still  tags
> > inside the data. Although I believe there are viewer then before but I
> > can not prove that. Fact is, there are still html tags inside the data.
> > 
> > Any other ideas what the problem could be?
> > 
> > 
> > 
> > 
> > 
> > 2011/7/25 Markus Jelsma
> > 
> >> You've three analyzer elements, i wonder what that would do. You need to
> >> add
> >> the char filter to the index-time analyzer.
> >> 
> >> On Monday 25 July 2011 13:09:14 Merlin Morgenstern wrote:
> >>> Hi there,
> >>> 
> >>> I am trying to strip html tags from the data before adding the
> >>> documents
> >> 
> >> to
> >> 
> >>> the index. To do that I altered schem.xml like this:
> >>>>>> 
> >>> positionIncrementGap="100" autoGeneratePhraseQueries="true">
> >>> 
> >>>  
> >>>  
> >>>   >>>  class="solr.WhitespaceTokenizerFactory"/>  >>>  class="solr.WordDelimiterFilterFactory"
> >>> 
> >>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >>> 
> >>>  
> >>>  
> >>>  
> >>>  
> >>>  
> >>>  
> >>>  
> >>>   >>>  class="solr.WhitespaceTokenizerFactory"/>  >>>  class="solr.WordDelimiterFilterFactory"
> >>> 
> >>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >>> 
> >>>  
> >>>  
> >>>  
> >>>  
> >>>  
> >>>  
> >>>  
> >>>   >>>  class="solr.HTMLStripCharFilterFactory"/>
> >>>  
> >>>>>>   class="solr.WhitespaceTokenizerFactory"/>
> >>>  
> >>>  
> >>>   
> >>>   
> >>>  
> >>>  
> >>>  
> >>>   >>> 
> >>> required="false"/>
> >>> 
> >>>  
> >>> 
> >>> Unfortunatelly this does not work, the hmtl tags like  are still
> >>> present after restarting and reindexing. I also tryed
> >>> htmlstriptransformer, but this did not work either.
> >>> 
> >>> Has anybody an idea how to get this done? Thank you in advance for any
> >>> hint.
> >>> 
> >>> Merlin
> >> 
> >> --
> >> Markus Jelsma - CTO - Openindex
> >> http://www.linkedin.com/in/markus17
> >> 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: strip html from data

2011-07-25 Thread Mike Sokolov
I think you need to list the charfilter earlier in the analysis chain; 
before the tokenizer.  Porbably Solr should tell you this...


-Mike

On 07/25/2011 09:03 AM, Merlin Morgenstern wrote:

sounds logical. I just changed it to the following, restarted and reindexed
with commit:

  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  

Unfortunatelly that did not fix the error. There are still  tags inside
the data. Although I believe there are viewer then before but I can not
prove that. Fact is, there are still html tags inside the data.

Any other ideas what the problem could be?





2011/7/25 Markus Jelsma

   

You've three analyzer elements, i wonder what that would do. You need to
add
the char filter to the index-time analyzer.

On Monday 25 July 2011 13:09:14 Merlin Morgenstern wrote:
 

Hi there,

I am trying to strip html tags from the data before adding the documents
   

to
 

the index. To do that I altered schem.xml like this:

  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
 
  

 
 
 

Unfortunatelly this does not work, the hmtl tags like  are still
present after restarting and reindexing. I also tryed
htmlstriptransformer, but this did not work either.

Has anybody an idea how to get this done? Thank you in advance for any
hint.

Merlin
   

--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

 
   


Ignore records that are missing a value in a field

2011-07-25 Thread Brian Lamb
Hi all,

I have an optional field called "common_names". I would like to keep this
field optional but at the same, occasionally do a search where I do not
include results where there is no value set for this field. Is this possible
to do within solr?

In other words, I would like to do a search where if there is no value set
for common_names, I would not want that record included in the search
result.

Thanks,

Brian Lamb


Strange suggestions with spell checker

2011-07-25 Thread Jens Hoffrichter
Hello all,

I'm getting a strange suggestion for a purposely mistyped word in Solr 1.4.1

I search for the term "snia", and I would expect the term "sina" to be
suggested, as this is a fairly common word in quite a bit of the indexed
documents.

Instead, I'm getting india as a suggestion, which is only indexed once, and
has (at least as far as my understanding of the algorithm goes) a greater
Levenshtein distance than sina.

The configuration for the spellchecker is pretty straigforward, basically
taken directly from the examples:

 

textSpell



  default
  spell
  true
  true
  ./spellchecker1
  freq
  .01


I have tried to use the comparatorClass there (as frequency would probably
yield better results for me), but only saw after that it is only available
for Solr4.

The complete suggestions I get from the standard search component is:


  
 
 5
 0
 4
 0
 
 
   india
   1
 
 
   sina
   30
 
 
soa
4
  
  
unit
3
  
  
 sei
 2
  


false



Apart from the india suggestions, the other ones are okay, though I need to
tune my stopwords for the (German) indexer a bit more.

Is there any explanation why india is chosen over sina in the suggestions?
Is there anything I can tweak in the configuration to get the desired
result?

If some information is missing, don't hestitate to ask, I will try to supply
it then.

Many thanks in advance,
Jens


RE: Spellcheck compounded words

2011-07-25 Thread Dyer, James
Related to this is this jira issue: 
https://issues.apache.org/jira/browse/SOLR-2585 . With this patch, Solr will 
consider alternatives in cases where a word is mispelled in its context, but 
nevertheless exists in the index and/or dictionary.  This is a work-in-progress 
and is for trunk only, but would make for another nice incremental improvement 
in the spellchecker.

This patch won't solve the problem at hand, but it may make the shingle 
workaround function in a few more cases.  Of course actually developing 
word-break-analysis into the spellchecker would be the right solution...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Monday, July 25, 2011 10:13 AM
To: solr-user@lucene.apache.org
Cc: Dyer, James
Subject: Re: Spellcheck compounded words

This will work for mispelled compounds indeed but not when the compound word 
is actually queried as two separate correctly spelled words. Most likely both 
sail and boat exist in the index as single token.

There is a work around but that's limited to a scenario where users never use 
more than 1 query term (or two in case of mispelled compounds). When your 
index has shingles and you replace the whitespace with a non-whitespace 
character you get a proper suggestion returned. The compound is then found as 
suggestion but not in the collation.

When queries contain more than two terms is most likely will never work this 
way. The results get really strange.

On Monday 25 July 2011 16:49:18 Dyer, James wrote:
> I'm afraid there currently isn't much support for correcting misplaced
> whitespace.  Solr is going to look at each word individually and won't
> even try to combine ajacent words (or split a word into 2 or more).  So
> there is no good way to get these kinds of suggestions.
> 
> One thing that might work in some cases is to create a spelling dictionary
> composed of shingles (2+ words indexed together as 1 token).  This
> approach is described in Smiley&Pugh's Solr book, (1st ed) p.180ff under
> the heading "An alternative approach".  I haven't tried this but it might
> be your best hope if this is a feature you've absolutely got to have.
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> 
> 
> -Original Message-
> From: O. Klein [mailto:kl...@octoweb.nl]
> Sent: Friday, July 22, 2011 8:11 PM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck compounded words
> 
> How do I get spellchecker to suggest compounded words?
> 
> Like. q=sail booat
> 
> and suggestion/collate is "sailboat" and "sail boat"
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3
> 192748.html Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Spellcheck compounded words

2011-07-25 Thread Markus Jelsma
This will work for mispelled compounds indeed but not when the compound word 
is actually queried as two separate correctly spelled words. Most likely both 
sail and boat exist in the index as single token.

There is a work around but that's limited to a scenario where users never use 
more than 1 query term (or two in case of mispelled compounds). When your 
index has shingles and you replace the whitespace with a non-whitespace 
character you get a proper suggestion returned. The compound is then found as 
suggestion but not in the collation.

When queries contain more than two terms is most likely will never work this 
way. The results get really strange.

On Monday 25 July 2011 16:49:18 Dyer, James wrote:
> I'm afraid there currently isn't much support for correcting misplaced
> whitespace.  Solr is going to look at each word individually and won't
> even try to combine ajacent words (or split a word into 2 or more).  So
> there is no good way to get these kinds of suggestions.
> 
> One thing that might work in some cases is to create a spelling dictionary
> composed of shingles (2+ words indexed together as 1 token).  This
> approach is described in Smiley&Pugh's Solr book, (1st ed) p.180ff under
> the heading "An alternative approach".  I haven't tried this but it might
> be your best hope if this is a feature you've absolutely got to have.
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> 
> 
> -Original Message-
> From: O. Klein [mailto:kl...@octoweb.nl]
> Sent: Friday, July 22, 2011 8:11 PM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck compounded words
> 
> How do I get spellchecker to suggest compounded words?
> 
> Like. q=sail booat
> 
> and suggestion/collate is "sailboat" and "sail boat"
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3
> 192748.html Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Schema Design/Data Import

2011-07-25 Thread Stefan Matheis

Am 25.07.2011 16:58, schrieb Erick Erickson:

Well, the attachment_1, attachment_2 idea would be awkward
to form queries (i.e. there would be 100 clauses if there were 100 docs?)
Dynamic fields have this same problem.


Oh, yes .. correct .. overlooked that part :/ sorry.


Re: filter query parameter not working as expected

2011-07-25 Thread Erick Erickson
Not that I know of, although it does give you the parsed fq results, which
you could then use as query parameters (i.e. the 'q' parameter) for debugging...

You have to use parens or fully qualify each term
(e.g. WAY_ANALYZED:rue WAY_ANALYZED:de), that's just how the
query parsing works...

Best
Erick

On Mon, Jul 25, 2011 at 10:49 AM, elisabeth benoit
 wrote:
> thanks
>
> using parenthesis
>
> select?&q=VINCI Park&fq=WAY_ANALYZED:(rue de l hotel de ville) AND
> (TOWN_ANALYZED:paris OR
> DEPARTEMENT_ANALYZED:paris)&rows=200&fl=NAME,TOWN,WAY,score&debugQuery=on
>
> works
>
> but I would rather not use parenthesis or AND between those words
>
> this brings another question: debugQuery=on doesn't give me any information
> about fq parameter match. only about q parameter match.
>
> Is there a way to have debug information about fq parameter match?
>
> Best regards,
> Elisabeth
>
>
>
> 2011/7/25 Erick Erickson 
>
>> Well, WAY_ANALYZED:de l hotel de ville parses as
>> WAY_ANALYZED:de default:l default:hotel default:de default:ville
>>
>> You probably want something like WAY_ANALYZED:(de l hotel de ville),
>> perhaps with AND between them. Try adding &debugQuery=on to your
>> queries and you can sometimes see this kind of thing...
>>
>> Best
>> Erick
>>
>> On Thu, Jul 21, 2011 at 3:23 AM, elisabeth benoit
>>  wrote:
>> > Hello,
>> >
>> > There is something I don't quite get with fq parameter.
>> >
>> > I have this query
>> >
>> > select?&q=VINCI Park&fq=WAY_ANALYZED:de l hotel de ville AND
>> > (TOWN_ANALYZED:paris OR DEPARTEMENT_ANALYZED:paris)&rows=200&fl=*,score
>> >
>> > and two answers. One having WAY_ANALYZED = 48 r de l'hôtel de ville,
>> which
>> > is ok
>> >
>> > and the other called Vinci Park but having WAY_ANALYZED = 143 r lecourbe.
>> >
>> > Is there something I didn't understand about fq parameter?
>> >
>> > I'm using Solr 3.2.
>> >
>> > Thanks,
>> > Elisabeth Benoit
>> >
>>
>


Re: Schema Design/Data Import

2011-07-25 Thread Erick Erickson
Well, the attachment_1, attachment_2 idea would be awkward
to form queries (i.e. there would be 100 clauses if there were 100 docs?)
Dynamic fields have this same problem.

You could certainly index them all into a big field, just make it
multivalued and do a SolrDocument.add("bigtextfield", docContents) for
each document. Watch out for the maxFieldLength parameter in solrconfig.xml,
you'll want to bump that way up.

You could also index a separate document for each attachment, then
perhaps use the grouping/field collapsing feature to gather them all
together, depending upon your requirements.

I'd either put them all in one field or use a separate solr document for each
row/attachment pair as a first approach...

Hope that helps
Erick

On Mon, Jul 25, 2011 at 10:36 AM, Travis Low  wrote:
> Thanks so much Erick (and Stefan).  Yes, I did some reading on SolrJ and
> Tika and you are spot-on.  We will write our own importer using SolrJ and
> then we can grab the DB records and parse any attachments along the way.
>
> Now it comes down to a schema design question.  The issue I'm struggling
> with is what kind of field or fields to use for the attachments.  The reason
> for the difficulty is that the documents we're most interested in are the DB
> records, not the attachments, and there could be 0 or 3 or 50 attachments
> for a single DB record.  Should we:
>
> (1) Just add fields called "attachment_0", "attachment_1", ... ,
> "attachment_100" to the schema?
> (2) Somehow index all attachments to a single field? (Is this even
> possible?)
> (3) Use dynamic fields?
> (4) None of the above?
>
> The idea is that if there is a hit in one of the attachments, then we need
> to show a link to the DB record.  It would be nice to show a link the the
> document as well, but that's less important.
>
> cheers,
>
> Travis
>
>
> On Mon, Jul 25, 2011 at 9:49 AM, Erick Erickson 
> wrote:
>
>> I'd seriously consider going with SolrJ as your indexing strategy, it
>> allows
>> you to do anything you need to do in Java code. You can call the Tika
>> library yourself on the files pointed to by your rows as you see fit,
>> indexing
>> them as you choose, perhaps one Solr doc per attachment, perhaps one
>> per row, whatever.
>>
>> Best
>> Erick
>>
>> On Wed, Jul 20, 2011 at 3:27 PM,   wrote:
>> >
>> > [Apologies if this is a duplicate -- I have sent several messages from my
>> work email and they just vanish, so I subscribed with my personal email]
>> >
>> > Greetings.  I am struggling to design a schema and a data import/update
>>  strategy for some semi-complicated data.  I would appreciate any input.
>> >
>> > What we have is a bunch of database records that may or may not have
>> files attached.  Sometimes no files, sometimes 50.
>> >
>> > The requirement is to index the database records AND the documents,  and
>> the search results would be just links to the database records.
>> >
>> > I'd  love to crawl the site with Nutch and be done with it, but we have a
>>  complicated search form with various codes and attributes for the  database
>> records, so we need a detailed schema that will loosely  correspond to boxes
>> on the search form.  I don't think we could easily  do that if we just crawl
>> the site.  But with a detailed schema, I'm  having trouble understanding how
>> we could import and index from the  database, and also index the related
>> files, and have the same schema  being populated, especially with the number
>> of related documents being  variable (maybe index them all to one field?).
>> >
>> > We have a lot of flexibility on how we can build this, so I'm open  to
>> any suggestions or pointers for further reading.  I've spent a fair  amount
>> of time on the wiki but I didn't see anything that seemed  directly
>> relevant.
>> >
>> > An additional difficulty, that I am willing to overlook for the  first
>> cut, is that some of these files are zipped, and some of the zip  files may
>> contain other zip files, to maybe 3 or 4 levels deep.
>> >
>> > Help, please?
>> >
>> > cheers,
>> >
>> > Travis
>>
>
>
>
> --
>
> **
>
> *Travis Low, Director of Development*
>
>
> ** * *
>
> *Centurion Research Solutions, LLC*
>
> *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*
>
> *703-956-6276 *•* 703-378-4474 (fax)*
>
> *http://www.centurionresearch.com* 
>
> **The information contained in this email message is confidential and
> protected from disclosure.  If you are not the intended recipient, any use
> or dissemination of this communication, including attachments, is strictly
> prohibited.  If you received this email message in error, please delete it
> and immediately notify the sender.
>
> This email message and any attachments have been scanned and are believed to
> be free of malicious software and defects that might affect any computer
> system in which they are received and opened. No responsibility is accepted
> by Centurion Research Solutions, LLC for any loss or damage

Re: Schema Design/Data Import

2011-07-25 Thread Stefan Matheis

Travis,

that sounds like a perfect usecase for dynamic fields .. attachment_* 
and there you go. works for no attachment, as well as one, three or 50.


for the user interface, you could iterate over them and show them as 
list - or something else that would fit your need.


also, maybe, you would have attachment_name_* and attachment_body_* ? 
otherwise the information, which (file-)name relates to which body would 
be lost .. at least on the solr-level.


Regards
Stefan

Am 25.07.2011 16:36, schrieb Travis Low:

Thanks so much Erick (and Stefan).  Yes, I did some reading on SolrJ and
Tika and you are spot-on.  We will write our own importer using SolrJ and
then we can grab the DB records and parse any attachments along the way.

Now it comes down to a schema design question.  The issue I'm struggling
with is what kind of field or fields to use for the attachments.  The reason
for the difficulty is that the documents we're most interested in are the DB
records, not the attachments, and there could be 0 or 3 or 50 attachments
for a single DB record.  Should we:

(1) Just add fields called "attachment_0", "attachment_1", ... ,
"attachment_100" to the schema?
(2) Somehow index all attachments to a single field? (Is this even
possible?)
(3) Use dynamic fields?
(4) None of the above?

The idea is that if there is a hit in one of the attachments, then we need
to show a link to the DB record.  It would be nice to show a link the the
document as well, but that's less important.

cheers,

Travis


On Mon, Jul 25, 2011 at 9:49 AM, Erick Ericksonwrote:


I'd seriously consider going with SolrJ as your indexing strategy, it
allows
you to do anything you need to do in Java code. You can call the Tika
library yourself on the files pointed to by your rows as you see fit,
indexing
them as you choose, perhaps one Solr doc per attachment, perhaps one
per row, whatever.

Best
Erick

On Wed, Jul 20, 2011 at 3:27 PM,  wrote:


[Apologies if this is a duplicate -- I have sent several messages from my

work email and they just vanish, so I subscribed with my personal email]


Greetings.  I am struggling to design a schema and a data import/update

  strategy for some semi-complicated data.  I would appreciate any input.


What we have is a bunch of database records that may or may not have

files attached.  Sometimes no files, sometimes 50.


The requirement is to index the database records AND the documents,  and

the search results would be just links to the database records.


I'd  love to crawl the site with Nutch and be done with it, but we have a

  complicated search form with various codes and attributes for the  database
records, so we need a detailed schema that will loosely  correspond to boxes
on the search form.  I don't think we could easily  do that if we just crawl
the site.  But with a detailed schema, I'm  having trouble understanding how
we could import and index from the  database, and also index the related
files, and have the same schema  being populated, especially with the number
of related documents being  variable (maybe index them all to one field?).


We have a lot of flexibility on how we can build this, so I'm open  to

any suggestions or pointers for further reading.  I've spent a fair  amount
of time on the wiki but I didn't see anything that seemed  directly
relevant.


An additional difficulty, that I am willing to overlook for the  first

cut, is that some of these files are zipped, and some of the zip  files may
contain other zip files, to maybe 3 or 4 levels deep.


Help, please?

cheers,

Travis








Re: problem with "?" wild card searches in solr

2011-07-25 Thread Tomás Fernández Löbbe
Are you using stemming on that field? Sometimes stemming and wildcards don't
get along very well. If you are, take a look at how the terms that should
match "ban?le" are analyzed on the Analysis section of the admin.

On Sat, Jul 23, 2011 at 6:33 AM, Romi  wrote:

> I am using solr for search . i am facing problem with wildcard searches.
> when i search for dia?mond i get result for diamond
> but when i search for ban?le i get no result.
>
> what can be the problem
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/problem-with-wild-card-searches-in-solr-tp3193222p3193222.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: commit time and lock

2011-07-25 Thread Erick Erickson
Yeah, the 1.4 code base is "older". That is, optimization will have more
effect on that vintage code than on 3.x and trunk code.

I should have been a bit more explicit in that other thread. In the case
where you add a bunch of documents, optimization doesn't buy you all
that much currently. If you delete a bunch of docs (or update a bunch of
existing docs), then optimization will reclaim resources. So you *could*
have a case where the size of your index shrank drastically after
optimization (say you updated the same 100K documents 10 times then
optimized).

But even that is "it depends" (tm). The new segment merging, as I remember,
will possibly reclaim deleted resources, but I'm parroting people who actually
know, so you might want to verify that if it

Optimization will almost certainly trigger a complete index replication to any
slaves configured, though.

So the usual advice is to optimize maybe once a day or week during off hours
as a starting point unless and until you can verify that your
particular situation
warrants optimizing more frequently.

Best
Erick

On Fri, Jul 22, 2011 at 11:53 AM, Jonathan Rochkind  wrote:
> How old is 'older'?  I'm pretty sure I'm still getting much faster 
> performance on an optimized index in Solr 1.4.
>
> This could be due to the nature of my index and queries (which include some 
> medium sized stored fields, and extensive facetting -- facetting on up to a 
> dozen fields in every request, where each field can include millions of 
> unique values. Amazing I can do this with good performance at all!).
>
> It's also possible i'm wrong about that faster performance, i haven't done 
> robustly valid benchmarking on a clone of my production index yet. But it 
> really looks like that way to me, from what investigation I have done.
>
> If the answer is that optimization is believed no longer neccesary on 
> versions LATER than 1.4, that might be the simplest explanation.
> 
> From: Pierre GOSSE [pierre.go...@arisem.com]
> Sent: Friday, July 22, 2011 10:23 AM
> To: solr-user@lucene.apache.org
> Subject: RE: commit time and lock
>
> Hi Mark
>
> I've read that in a thread title " Weird optimize performance degradation", 
> where Erick Erickson states that "Older versions of Lucene would search 
> faster on an optimized index, but this is no longer necessary.", and more 
> recently in a thread you initiated a month ago "Question about optimization".
>
> I'll also be very interested if anyone had a more precise idea/datas of 
> benefits and tradeoff of optimize vs merge ...
>
> Pierre
>
>
> -Message d'origine-
> De : Marc SCHNEIDER [mailto:marc.schneide...@gmail.com]
> Envoyé : vendredi 22 juillet 2011 15:45
> À : solr-user@lucene.apache.org
> Objet : Re: commit time and lock
>
> Hello,
>
> Pierre, can you tell us where you read that?
> "I've read here that optimization is not always a requirement to have an
> efficient index, due to some low level changes in lucene 3.xx"
>
> Marc.
>
> On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSE wrote:
>
>> Solr will response for search during optimization, but commits will have to
>> wait the end of the optimization process.
>>
>> During optimization a new index is generated on disk by merging every
>> single file of the current index into one big file, so you're server will be
>> busy, especially regarding disk access. This may alter your response time
>> and has very negative effect on the replication of index if you have a
>> master/slave architecture.
>>
>> I've read here that optimization is not always a requirement to have an
>> efficient index, due to some low level changes in lucene 3.xx, so maybe you
>> don't really need optimization. What version of solr are you using ? Maybe
>> someone can point toward a relevant link about optimization other than solr
>> wiki
>> http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations
>>
>> Pierre
>>
>>
>> -Message d'origine-
>> De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
>> Envoyé : vendredi 22 juillet 2011 12:45
>> À : solr-user@lucene.apache.org
>> Objet : Re: commit time and lock
>>
>> Thanks for clarity.
>>
>> One more thing I want to know about optimization.
>>
>> Right now I am planning to optimize the server in 24 hour. Optimization is
>> also time taking ( last time took around 13 minutes), so I want to know
>> that
>> :
>>
>> 1. when optimization is under process that time will solr server response
>> or
>> not?
>> 2. if server will not response then how to do optimization of server fast
>> or
>> other way to do optimization so our user will not have to wait to finished
>> optimization process.
>>
>> regards
>> Jonty
>>
>>
>>
>> On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE > >wrote:
>>
>> > Solr still respond to search queries during commit, only new indexations
>> > requests will have to wait (until end of commit?). So I don't think your
>> > users will experience increased response time during commits

RE: Spellcheck compounded words

2011-07-25 Thread Dyer, James
I'm afraid there currently isn't much support for correcting misplaced 
whitespace.  Solr is going to look at each word individually and won't even try 
to combine ajacent words (or split a word into 2 or more).  So there is no good 
way to get these kinds of suggestions.

One thing that might work in some cases is to create a spelling dictionary 
composed of shingles (2+ words indexed together as 1 token).  This approach is 
described in Smiley&Pugh's Solr book, (1st ed) p.180ff under the heading "An 
alternative approach".  I haven't tried this but it might be your best hope if 
this is a feature you've absolutely got to have.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Friday, July 22, 2011 8:11 PM
To: solr-user@lucene.apache.org
Subject: Spellcheck compounded words

How do I get spellchecker to suggest compounded words?

Like. q=sail booat

and suggestion/collate is "sailboat" and "sail boat"

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3192748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: filter query parameter not working as expected

2011-07-25 Thread elisabeth benoit
thanks

using parenthesis

select?&q=VINCI Park&fq=WAY_ANALYZED:(rue de l hotel de ville) AND
(TOWN_ANALYZED:paris OR
DEPARTEMENT_ANALYZED:paris)&rows=200&fl=NAME,TOWN,WAY,score&debugQuery=on

works

but I would rather not use parenthesis or AND between those words

this brings another question: debugQuery=on doesn't give me any information
about fq parameter match. only about q parameter match.

Is there a way to have debug information about fq parameter match?

Best regards,
Elisabeth



2011/7/25 Erick Erickson 

> Well, WAY_ANALYZED:de l hotel de ville parses as
> WAY_ANALYZED:de default:l default:hotel default:de default:ville
>
> You probably want something like WAY_ANALYZED:(de l hotel de ville),
> perhaps with AND between them. Try adding &debugQuery=on to your
> queries and you can sometimes see this kind of thing...
>
> Best
> Erick
>
> On Thu, Jul 21, 2011 at 3:23 AM, elisabeth benoit
>  wrote:
> > Hello,
> >
> > There is something I don't quite get with fq parameter.
> >
> > I have this query
> >
> > select?&q=VINCI Park&fq=WAY_ANALYZED:de l hotel de ville AND
> > (TOWN_ANALYZED:paris OR DEPARTEMENT_ANALYZED:paris)&rows=200&fl=*,score
> >
> > and two answers. One having WAY_ANALYZED = 48 r de l'hôtel de ville,
> which
> > is ok
> >
> > and the other called Vinci Park but having WAY_ANALYZED = 143 r lecourbe.
> >
> > Is there something I didn't understand about fq parameter?
> >
> > I'm using Solr 3.2.
> >
> > Thanks,
> > Elisabeth Benoit
> >
>


using distributed search with the suggest component

2011-07-25 Thread Tobias Rübner
Hi,

I try to use the suggest component (solr 3.3) with multiple cores.
I added a search component and a request handler as described in the docs (
http://wiki.apache.org/solr/Suggester) to my solrconfig.
That works fine for 1 core but querying my solr instance with the shards
parameter does not query multiple cores.
It just ignores the shards parameter.
http://localhost:/solr/core1/suggest?q=sa&shards=localhost:/solr/core1,localhost:/solr/core2

The documentation of the SpellCheckComponent (
http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support)
is a bit vage in that point, because I don't know if this feature really
works with solr 3.3. It is targeted for solr 1.5, which will never come, but
says, it is now available.
I also tried the shards.qt paramater, but it does not change my results.

Thanks for any help,
Tobias


Re: Schema Design/Data Import

2011-07-25 Thread Travis Low
Thanks so much Erick (and Stefan).  Yes, I did some reading on SolrJ and
Tika and you are spot-on.  We will write our own importer using SolrJ and
then we can grab the DB records and parse any attachments along the way.

Now it comes down to a schema design question.  The issue I'm struggling
with is what kind of field or fields to use for the attachments.  The reason
for the difficulty is that the documents we're most interested in are the DB
records, not the attachments, and there could be 0 or 3 or 50 attachments
for a single DB record.  Should we:

(1) Just add fields called "attachment_0", "attachment_1", ... ,
"attachment_100" to the schema?
(2) Somehow index all attachments to a single field? (Is this even
possible?)
(3) Use dynamic fields?
(4) None of the above?

The idea is that if there is a hit in one of the attachments, then we need
to show a link to the DB record.  It would be nice to show a link the the
document as well, but that's less important.

cheers,

Travis


On Mon, Jul 25, 2011 at 9:49 AM, Erick Erickson wrote:

> I'd seriously consider going with SolrJ as your indexing strategy, it
> allows
> you to do anything you need to do in Java code. You can call the Tika
> library yourself on the files pointed to by your rows as you see fit,
> indexing
> them as you choose, perhaps one Solr doc per attachment, perhaps one
> per row, whatever.
>
> Best
> Erick
>
> On Wed, Jul 20, 2011 at 3:27 PM,   wrote:
> >
> > [Apologies if this is a duplicate -- I have sent several messages from my
> work email and they just vanish, so I subscribed with my personal email]
> >
> > Greetings.  I am struggling to design a schema and a data import/update
>  strategy for some semi-complicated data.  I would appreciate any input.
> >
> > What we have is a bunch of database records that may or may not have
> files attached.  Sometimes no files, sometimes 50.
> >
> > The requirement is to index the database records AND the documents,  and
> the search results would be just links to the database records.
> >
> > I'd  love to crawl the site with Nutch and be done with it, but we have a
>  complicated search form with various codes and attributes for the  database
> records, so we need a detailed schema that will loosely  correspond to boxes
> on the search form.  I don't think we could easily  do that if we just crawl
> the site.  But with a detailed schema, I'm  having trouble understanding how
> we could import and index from the  database, and also index the related
> files, and have the same schema  being populated, especially with the number
> of related documents being  variable (maybe index them all to one field?).
> >
> > We have a lot of flexibility on how we can build this, so I'm open  to
> any suggestions or pointers for further reading.  I've spent a fair  amount
> of time on the wiki but I didn't see anything that seemed  directly
> relevant.
> >
> > An additional difficulty, that I am willing to overlook for the  first
> cut, is that some of these files are zipped, and some of the zip  files may
> contain other zip files, to maybe 3 or 4 levels deep.
> >
> > Help, please?
> >
> > cheers,
> >
> > Travis
>



-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* 

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed to
be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from the
content of this email.


Re: Rounding errors in solr

2011-07-25 Thread Brian Lamb
Yes and that's causing some problems in my application. Is there a way to
truncate the 7th decimal place in regards to sorting by the score?

On Fri, Jul 22, 2011 at 4:27 PM, Yonik Seeley wrote:

> On Fri, Jul 22, 2011 at 4:11 PM, Brian Lamb
>  wrote:
> > I've noticed some peculiar scoring issues going on in my application. For
> > example, I have a field that is multivalued and has several records that
> > have the same value. For example,
> >
> > 
> >  National Society of Animal Lovers
> >  Nat. Soc. of Ani. Lov.
> > 
> >
> > I have about 300 records with that exact value.
> >
> > Now, when I do a search for references:(national society animal lovers),
> I
> > get the following results:
> >
> > 252
> > 159
> > 82
> > 452
> > 105
> >
> > When I do a search for references:(nat soc ani lov), I get the results
> > ordered differently:
> >
> > 510
> > 122
> > 501
> > 82
> > 252
> >
> > When I load all the records that match, I notice that at some point, the
> > scores aren't the same but differ by only a little:
> >
> > 1.471928 in one and the one before it was 1.471929
>
> 32 bit floats only have 7 decimal digits of precision, and in floating
> point land (a+b+c) can be slightly different than (c+b+a)
>
> -Yonik
> http://www.lucidimagination.com
>


Re: Frange Function Query

2011-07-25 Thread Erick Erickson
I'm no expert on frange, but fq clauses are intersections. So if your
two frange queries have no terms in common, you won't get anything.

You can think of it as an implied AND between all the fq clauses you specify...

Best
Erick

On Thu, Jul 21, 2011 at 5:29 AM, Rohit Gupta  wrote:
> Hi,
>
> I have the following query in which I am using !frange function twice and the
> query is not returning any results. However if i use a single !frange function
> then the results come for the same query.
>
> Is it now possible to execute two franges in a single query?
>
> q="woolmark"&fq={!frange l=33787806 u=33787918}id&fq={!frange
> l=40817415}id&fq=createdOnGMTDate:[2011-07-01T14%3A30%3A00Z+TO+2011-07-21T14%3A30%3A00Z]
>
>
> Regards,
> Rohit


Re: filter query parameter not working as expected

2011-07-25 Thread Erick Erickson
Well, WAY_ANALYZED:de l hotel de ville parses as
WAY_ANALYZED:de default:l default:hotel default:de default:ville

You probably want something like WAY_ANALYZED:(de l hotel de ville),
perhaps with AND between them. Try adding &debugQuery=on to your
queries and you can sometimes see this kind of thing...

Best
Erick

On Thu, Jul 21, 2011 at 3:23 AM, elisabeth benoit
 wrote:
> Hello,
>
> There is something I don't quite get with fq parameter.
>
> I have this query
>
> select?&q=VINCI Park&fq=WAY_ANALYZED:de l hotel de ville AND
> (TOWN_ANALYZED:paris OR DEPARTEMENT_ANALYZED:paris)&rows=200&fl=*,score
>
> and two answers. One having WAY_ANALYZED = 48 r de l'hôtel de ville, which
> is ok
>
> and the other called Vinci Park but having WAY_ANALYZED = 143 r lecourbe.
>
> Is there something I didn't understand about fq parameter?
>
> I'm using Solr 3.2.
>
> Thanks,
> Elisabeth Benoit
>


Re: Getting a wierd Class Not Found Exception: SolrParams

2011-07-25 Thread Erick Erickson
Well, MultiMapSolrParams is a subclass of SolrParams, so you actually
do use it in your code 

But this looks like a classpath problem. You say your code compiles,
but do you make all the jars you path to during compilation available
to your servlet? And/or do you have any old jar files in your classpath?

Best
Erick

On Thu, Jul 21, 2011 at 3:00 AM, Sowmya V.B.  wrote:
> Hi All
>
> I have been getting this wierd error since yday evening, whose cause I am
> not able to figure out.
> I made a webinterface to read and display Solr Results, which is a servlet
> that calls Solr Servlet.
> I am
>
> I give the query to Solr, using:
> MultiMapSolrParams solrparamsmini =
> SolrRequestParsers.parseQueryString(queryrequest.toString());
> -where queryrequest contains all the ingredients of a Solr query.
>
> Eg:   StringBuffer queryrequest = new StringBuffer();
>        queryrequest.append("&q=" + query);
>
> queryrequest.append("&start=0&rows=30&hl=true&hl.fl=text&hl.frag=500&defType=dismax");
>
> queryrequest.append("&bq="+Field1+":["+frompercent+"%20TO%20"+topercent+"]");
>
> It compiles and builds without errors, but I get this error
> "java.lang.ClassNotFoundException:
> org.apache.solr.common.params.SolrParams", when I run the app.
> But, I dont use SolrParams class anywhere in my code!
>
> Here is the stack trace:
> INFO: Server startup in 1953 ms
> Jul 21, 2011 8:52:20 AM org.apache.catalina.core.ApplicationContext log
> INFO: Marking servlet solrsearch as unavailable
> Jul 21, 2011 8:52:20 AM org.apache.catalina.core.StandardWrapperValve invoke
> SEVERE: Allocate exception for servlet solrsearch
> java.lang.ClassNotFoundException: org.apache.solr.common.params.SolrParams
>    at
> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1676)
>    at
> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1521)
>    at java.lang.Class.getDeclaredConstructors0(Native Method)
>    at java.lang.Class.privateGetDeclaredConstructors(Class.java:2389)
>    at java.lang.Class.getConstructor0(Class.java:2699)
>    at java.lang.Class.newInstance0(Class.java:326)
>    at java.lang.Class.newInstance(Class.java:308)
>    at
> org.apache.catalina.core.DefaultInstanceManager.newInstance(DefaultInstanceManager.java:119)
>    at
> org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1062)
>    at
> org.apache.catalina.core.StandardWrapper.allocate(StandardWrapper.java:813)
>    at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:135)
>    at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164)
>    at
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462)
>    at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
>    at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
>    at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:562)
>    at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>    at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:395)
>    at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:250)
>    at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188)
>    at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:166)
>    at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>    at java.lang.Thread.run(Thread.java:680)
>
>
> Anyone had this kind of issue before?
> --
> Sowmya V.B.
> 
> Losing optimism is blasphemy!
> http://vbsowmya.wordpress.com
> 
>


highlighting fragsize

2011-07-25 Thread jame vaalet
hi
when u highlight and get back snippet fragments , can you over write the
default hl.regex.pattern through url .
can some quote an example url of that sort ?

what if i make pass hl.slop=0 will this stop considering regex pattern at
all ?

>


-- 

-JAME


Re: Schema Design/Data Import

2011-07-25 Thread Erick Erickson
I'd seriously consider going with SolrJ as your indexing strategy, it allows
you to do anything you need to do in Java code. You can call the Tika
library yourself on the files pointed to by your rows as you see fit, indexing
them as you choose, perhaps one Solr doc per attachment, perhaps one
per row, whatever.

Best
Erick

On Wed, Jul 20, 2011 at 3:27 PM,   wrote:
>
> [Apologies if this is a duplicate -- I have sent several messages from my 
> work email and they just vanish, so I subscribed with my personal email]
>
> Greetings.  I am struggling to design a schema and a data import/update  
> strategy for some semi-complicated data.  I would appreciate any input.
>
> What we have is a bunch of database records that may or may not have files 
> attached.  Sometimes no files, sometimes 50.
>
> The requirement is to index the database records AND the documents,  and the 
> search results would be just links to the database records.
>
> I'd  love to crawl the site with Nutch and be done with it, but we have a  
> complicated search form with various codes and attributes for the  database 
> records, so we need a detailed schema that will loosely  correspond to boxes 
> on the search form.  I don't think we could easily  do that if we just crawl 
> the site.  But with a detailed schema, I'm  having trouble understanding how 
> we could import and index from the  database, and also index the related 
> files, and have the same schema  being populated, especially with the number 
> of related documents being  variable (maybe index them all to one field?).
>
> We have a lot of flexibility on how we can build this, so I'm open  to any 
> suggestions or pointers for further reading.  I've spent a fair  amount of 
> time on the wiki but I didn't see anything that seemed  directly relevant.
>
> An additional difficulty, that I am willing to overlook for the  first cut, 
> is that some of these files are zipped, and some of the zip  files may 
> contain other zip files, to maybe 3 or 4 levels deep.
>
> Help, please?
>
> cheers,
>
> Travis


Re: - character in search query

2011-07-25 Thread Erick Erickson
dismax is a fairly narrow-use parser. By that I mean it was created
to solve a specific issue. It has some pronounced warts as you've
discovered.

edismax is the preferred parser if you have access to it. I'd just
ignore dismax if you have access to edismax. There's been some
talk of deprecating dismax in favor of edismax in fact.

But if you really want to know, see:
https://issues.apache.org/jira/browse/SOLR-1553

Best
Erick

On Wed, Jul 20, 2011 at 5:10 AM, roySolr  wrote:
> When i use the edismax handler the escaping works great(before i used the
> dismax handler).The debugQuery shows me this:
>
> +((DisjunctionMaxQuery((name:arsenal)~1.0)
> DisjunctionMaxQuery((name:london)~1.0))~2
>
> The "\" is not in the parsedquery, so i get the results i wanted. I don't
> know why the dismax handler working this way.
>
> Can someone tells me the difference between the dismax and edismax handler?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3184941.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: how to get solr core information using solrj

2011-07-25 Thread Erick Erickson
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/request/CoreAdminRequest.html

That should get you started.


Best
Erick

On Tue, Jul 19, 2011 at 11:40 PM, Jiang mingyuan
 wrote:
> hi all,
>
> Our solr server contains two cores:core0,core1,and they both works well.
>
> Now I'am trying to find a way to get information about core0 and core1.
>
> Can solrj or other api do this?
>
>
> thanks very much.
>


Re: strip html from data

2011-07-25 Thread Markus Jelsma
Are you looking at the returned result set or what you've actually indexed? 
Analyzers are not run on the stored data, only on indexed data.

On Monday 25 July 2011 15:03:18 Merlin Morgenstern wrote:
> sounds logical. I just changed it to the following, restarted and reindexed
> with commit:
> 
>   positionIncrementGap="100" autoGeneratePhraseQueries="true">
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
> 
> 
> 
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
> 
> 
> 
>  
> 
> Unfortunatelly that did not fix the error. There are still  tags inside
> the data. Although I believe there are viewer then before but I can not
> prove that. Fact is, there are still html tags inside the data.
> 
> Any other ideas what the problem could be?
> 
> 
> 
> 
> 
> 2011/7/25 Markus Jelsma 
> 
> > You've three analyzer elements, i wonder what that would do. You need to
> > add
> > the char filter to the index-time analyzer.
> > 
> > On Monday 25 July 2011 13:09:14 Merlin Morgenstern wrote:
> > > Hi there,
> > > 
> > > I am trying to strip html tags from the data before adding the
> > > documents
> > 
> > to
> > 
> > > the index. To do that I altered schem.xml like this:
> > >   > > 
> > > positionIncrementGap="100" autoGeneratePhraseQueries="true">
> > > 
> > > 
> > > 
> > >  > > class="solr.WhitespaceTokenizerFactory"/>  > > class="solr.WordDelimiterFilterFactory"
> > > 
> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > >  > > class="solr.WhitespaceTokenizerFactory"/>  > > class="solr.WordDelimiterFilterFactory"
> > > 
> > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > >  > > class="solr.HTMLStripCharFilterFactory"/>
> > > 
> > >   > >  class="solr.WhitespaceTokenizerFactory"/>
> > > 
> > > 
> > >  
> > >  
> > > 
> > > 
> > > 
> > >  > > 
> > > required="false"/>
> > > 
> > > 
> > > 
> > > Unfortunatelly this does not work, the hmtl tags like  are still
> > > present after restarting and reindexing. I also tryed
> > > htmlstriptransformer, but this did not work either.
> > > 
> > > Has anybody an idea how to get this done? Thank you in advance for any
> > > hint.
> > > 
> > > Merlin
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: strip html from data

2011-07-25 Thread Merlin Morgenstern
sounds logical. I just changed it to the following, restarted and reindexed
with commit:

 
















 

Unfortunatelly that did not fix the error. There are still  tags inside
the data. Although I believe there are viewer then before but I can not
prove that. Fact is, there are still html tags inside the data.

Any other ideas what the problem could be?





2011/7/25 Markus Jelsma 

> You've three analyzer elements, i wonder what that would do. You need to
> add
> the char filter to the index-time analyzer.
>
> On Monday 25 July 2011 13:09:14 Merlin Morgenstern wrote:
> > Hi there,
> >
> > I am trying to strip html tags from the data before adding the documents
> to
> > the index. To do that I altered schem.xml like this:
> >
> >   > positionIncrementGap="100" autoGeneratePhraseQueries="true">
> > 
> > 
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > 
> > 
> > 
> > 
> > 
> > 
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> > 
> > 
> > 
> > 
> > 
> > 
> >  
> > 
> >  
> >
> > 
> >  > required="false"/>
> > 
> >
> > Unfortunatelly this does not work, the hmtl tags like  are still
> > present after restarting and reindexing. I also tryed
> > htmlstriptransformer, but this did not work either.
> >
> > Has anybody an idea how to get this done? Thank you in advance for any
> > hint.
> >
> > Merlin
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>


Re: strip html from data

2011-07-25 Thread Markus Jelsma
You've three analyzer elements, i wonder what that would do. You need to add 
the char filter to the index-time analyzer.

On Monday 25 July 2011 13:09:14 Merlin Morgenstern wrote:
> Hi there,
> 
> I am trying to strip html tags from the data before adding the documents to
> the index. To do that I altered schem.xml like this:
> 
>   positionIncrementGap="100" autoGeneratePhraseQueries="true">
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
> 
> 
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
> 
> 
> 
> 
>  
> 
>  
> 
> 
>  required="false"/>
> 
> 
> Unfortunatelly this does not work, the hmtl tags like  are still
> present after restarting and reindexing. I also tryed
> htmlstriptransformer, but this did not work either.
> 
> Has anybody an idea how to get this done? Thank you in advance for any
> hint.
> 
> Merlin

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


in fragsize whats the pre hit number and post hit number

2011-07-25 Thread jame vaalet
hi,
while searching for word "SOLR" in
"highlighting  in solr can be manipulated"
with frag-size =10 .

how is the fragment decided ? how many characters are taken before the world
SOLR and after the word SOLR ?


jame


SolrJ and class versions

2011-07-25 Thread Tarjei Huse
Hi, I recently went through a little hell when I upgraded my Solr
servers to 3.2.0. What I didn't anticipate was that my Java SolrJ
clients depend on the server version.

I would like to add a note about this in the SolrJ docs:
http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update

Any comments with regard to this?

Are all SolrJ methods dependent on Java Serialization and thus class
versions?

-- 
Regards / Med vennlig hilsen
Tarjei Huse
Mobil: 920 63 413



Re: Wiki Error JSON syntax

2011-07-25 Thread Remy Loubradou
Hey Stephan,

Thanks, but I already used this solr client and I got an error when I add
too much documents "FATAL ERROR: JS Allocation failed - process out of
memory".
I didn't find the source of the problem in the solr client. So I decided to
write my own without this error hopefully and also I'm using JSON documents
and not XML documents. I read a post saying that I can get better
performance using JSON documents.

I will release this client as an npm module.

Regards,
Remy

2011/7/25 Stefan Matheis 

> Remy,
>
> didn't use it myself .. but you know about https://github.com/gsf/node-**
> solr  ?
>
> Regards
> Stefan
>
> Am 20.07.2011 20:05, schrieb Remy Loubradou:
>
>  I think I can trust you but this is weird.
>> Funny things if you try to validate on http://jsonlint.com/ this JSON,
>> duplicates keys are automatically removed. But the thing is, how can you
>> possibly generate this json with Javascript Object?
>>
>> It will be really nice to combine both ways that you show on the page.
>> Something like:
>>
>> {
>> "add": [
>> {
>> "doc": {
>> "id": "DOC1",
>> "my_boosted_field": {
>> "boost": 2.3,
>> "value": "test"
>> },
>> "my_multivalued_field": [
>> "aaa",
>> "bbb"
>> ]
>> }
>> },
>> {
>> "commitWithin": 5000,
>> "overwrite": false,
>> "boost": 3.45,
>> "doc": {
>> "f1": "v2"
>> }
>> }
>> ],
>> "commit": {},
>> "optimize": {
>> "waitFlush": false,
>> "waitSearcher": false
>> },
>> "delete": [
>> {
>> "id": "ID"
>> },
>> {
>> "query": "QUERY"
>> }
>> ]
>> }
>>
>> Thanks you for you previous response Yonik.
>>
>> 2011/7/20 Yonik 
>> Seeley
>> >
>>
>>  On Wed, Jul 20, 2011 at 12:16 PM, Remy Loubradou
>>>   wrote:
>>>
 Hi,
 I was writing a Solr Client API for Node and I found an error on this

>>> page
>>>
 http://wiki.apache.org/solr/**UpdateJSON,on
  the section "Update Commands"

>>> the
>>>
 JSON is not valid because there are duplicate keys and two times with

>>> "add"
>>>
 and "delete".

>>>
>>> It's a common misconception that it's invalid JSON.  Duplicate keys
>>> are in fact legal.
>>>
>>> -Yonik
>>> http://www.lucidimagination.**com 
>>>
>>> I tried with an array and it doesn't work as well, I got error
>>>
 400, I think that's because the syntax is bad.

 I don't really know if I am at the good place to talk about that but ...
 that the only place I found. Sorry if it's not.

 Thanks,

 And I love Solr :)


>>>
>>


Re: Is anobdy using lotsofcores feature in production?

2011-07-25 Thread Markus Jelsma
No i missed something and interpreted the question as using a lot of cores.

> LotsOfCores does not exist as a feature. It is just a write-up, some jira
> issues and a couple of patches. Did I miss something?
> 
> On Sun, Jul 24, 2011 at 8:26 PM, Markus Jelsma
> 
> wrote:
> > It works fine but you would keep an eye on additional overhead, cores
> > `stealing` too much CPU from others, trouble with cores that merge
> > segments stealing I/O and of course RAM. It can also result in quite a
> > high number of
> > open file descriptors.
> > 
> > There are more, but these seem most common to me.
> > 
> > > Hi,
> > > 
> > > Is anbody using lots of core feature in production? Is this feature
> > > scalable. I have around 1000 core and want to use this feature. Will
> > 
> > there
> > 
> > > be any issue in production?
> > > 
> > > http://wiki.apache.org/solr/LotsOfCores
> > > 
> > > Thanks,
> > > Umesh
> > > 
> > > --
> > 
> > > View this message in context:
> > http://lucene.472066.n3.nabble.com/Is-anobdy-using-lotsofcores-feature-in
> > -
> > 
> > > production-tp3193798p3193798.html Sent from the Solr - User mailing
> > > list archive at Nabble.com.


Re: Is anobdy using lotsofcores feature in production?

2011-07-25 Thread Shalin Shekhar Mangar
LotsOfCores does not exist as a feature. It is just a write-up, some jira
issues and a couple of patches. Did I miss something?

On Sun, Jul 24, 2011 at 8:26 PM, Markus Jelsma
wrote:

> It works fine but you would keep an eye on additional overhead, cores
> `stealing` too much CPU from others, trouble with cores that merge segments
> stealing I/O and of course RAM. It can also result in quite a high number
> of
> open file descriptors.
>
> There are more, but these seem most common to me.
>
> > Hi,
> >
> > Is anbody using lots of core feature in production? Is this feature
> > scalable. I have around 1000 core and want to use this feature. Will
> there
> > be any issue in production?
> >
> > http://wiki.apache.org/solr/LotsOfCores
> >
> > Thanks,
> > Umesh
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Is-anobdy-using-lotsofcores-feature-in-
> > production-tp3193798p3193798.html Sent from the Solr - User mailing list
> > archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: dih fetching but not adding records to index

2011-07-25 Thread Gora Mohanty
On Fri, Jul 22, 2011 at 12:42 AM, abhayd  wrote:
> hi
> I m trying to load data into solr index from a xml file using dih
>
> my promotions.xml file
> --
> 
>        
>                3
>        
>        
>                4
>        
> 
[...]

This is already a complete SolrXML file, and you do not
need DIH. Instead, use post.sh in example/exampledocs
in your Solr distribution. With Solr running in the embedded
Jetty server, the command would be:
  ./post.sh promotions.xml
If you are running Solr in some other fashion, please modify
post.sh as needed.

Regards,
Gora