Lucene index verifier

2008-02-07 Thread Lance Norskog
(Sorry, my Lucene java-user access is wonky.) I would like to verify that my snapshots are not corrupt before I enable them. What is the simplest program to verify that a Lucene index is not corrupt? Or, what is a Solr query that will verify that there is no corruption? With the minimum amoun

RE: Memory improvements

2008-02-07 Thread Lance Norskog
Solr 1.2 has a bug where if you say "commit after N documents" it does not. But it does honor the "commit after N milliseconds" directive. This is fixed in Solr 1.3. -Original Message- From: Sundar Sankaranarayanan [mailto:[EMAIL PROTECTED] Sent: Thursday, February 07, 2008 3:30 PM To:

Many updates slow down SOLR performance, no commit/autocommit

2008-02-07 Thread Fuad Efendi
Question: Why constant updates slow down SOLR performance even if I am not executing Commit? I just noticed this... Thead dump shows something "Lucene ... Clone()", and significant CPU usage. I did about 5 mlns updates via HTTP XML, single document at a time, without commit, and performance went

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Fuad Efendi
> With DisMax, and simple query which is single double-quote > character, SOLR > responds with > 500 > org.apache.solr.common.SolrException: Cannot parse '': ... > It is not polite neither to user's input nor to HTTP specs... Ooohh... Sorry again: it is the only case where SOLR is polite with

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Fuad Efendi
With DisMax, and simple query which is single double-quote character, SOLR responds with 500 org.apache.solr.common.SolrException: Cannot parse '': Encountered "" at line 1, column 0. Was expecting one of: ... " " ... "-" ... "(" ... "*" ... ... ... ... ... "[" ... "{" ... ... org.apache.lucene.q

Re: Query with literal quote character: 6'2"

2008-02-07 Thread Yonik Seeley
On Feb 7, 2008 8:35 PM, Fuad Efendi <[EMAIL PROTECTED]> wrote: > - is it a bug of DixMax?... It happens even before request reaches dismax. That's what this whole thread has been about :-) Stripping unbalanced quotes is part of dismax. -Yonik

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Fuad Efendi
> > I think: no. And 6'2" works just as prescribed: > > Not really... it depends on the analyzer. If the index analyzer for > the field ends up stripping off the trailing quote anyway, then the > dismax query (which also dropped the quote) will match documents. > That's why you don't see any issu

Re: Query with literal quote character: 6'2"

2008-02-07 Thread Yonik Seeley
On Feb 7, 2008 6:35 PM, Fuad Efendi <[EMAIL PROTECTED]> wrote: > Anyway I can't understand where is the problem?!! Everything works fine with > dismax/standard/escaping/encoding. > Can we use AND operator with dismax by > the way? No. > I think: no. And 6'2" works just as prescribed: Not really

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Fuad Efendi
> (catalina.out file of SOLR, > http://www.tokenizer.org/armani/price.htm?q=Romeo%2bJuliet > from production) > ... > ... DISMAX queries via CONSOLE do not support > that... Opsss... Again mistake, sorry. http://192.168.1.5:18080/apache-solr-1.2.0/select?indent=on&version=2.2&q=Ro meo%2BJuliet&s

Memory improvements

2008-02-07 Thread Sundar Sankaranarayanan
Hi All, I am running an application in which I am having to index about 300,000 records of a table which has 6 columns. I am committing to the solr server after every 10,000 rows and I observed that the by the end of about 150,000 the process eats up about 1 Gig of memory, and since my se

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Fuad Efendi
> while i agree that you don't wnat to expose your end users > directly to > Solr (largely for security reasons) that doesn't mean you *must* > preprocess user entered strings before handing them to dismax > ... dismax's > whole goal is to make it posisble for apps to not have to worry about

Socket exception

2008-02-07 Thread Sundar Sankaranarayanan
Hi All, I am using Solr for about a couple of months now and am very satisfied with it. My solr on dev environment runs on a windows box with 1 gig memory and the solr.war is deployed on a jboss 4.05 version. When investigating on a "Solr commit not working sometimes issue " in our applicati

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Chris Hostetter
: It is not a bug/problem of SOLR. SOLR can't be exposed directly to end : users. For handling user input and generating SOLR-specific query, use while i agree that you don't wnat to expose your end users directly to Solr (largely for security reasons) that doesn't mean you *must* preprocess us

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Chris Hostetter
: http://192.168.1.5:18080/apache-solr-1.2.0/select/?q=*&version=2.2&start=0&r : ows=10&indent=on That's using standard request handler right? ... that's a much differnet discussion -- when using standard you must of course be aware of hte syntax and the special characters ... Walter and i hav

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Fuad Efendi
This is what appears in Address Bar of IE: http://localhost:8080/apache-solr-1.2.0/select/?q=item_name%3A%22Romeo%2BJul iet%22%2Bcategory%3A%22books%22&version=2.2&start=0&rows=10&indent=on Input was: item_name:"Romeo+Juliet"+category:"books" Another input which works just fine: item_name:"6'\""

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Fuad Efendi
Try this query with asterisk * http://192.168.1.5:18080/apache-solr-1.2.0/select/?q=*&version=2.2&start=0&r ows=10&indent=on Response: HTTP Status 400 - Query parsing error: Cannot parse '*': '*' or '?' not allowed as first character in WildcardQuery

Re: strange updating inconsistency

2008-02-07 Thread Chris Hostetter
: odd behavior while updating. The use case is that a document gets indexed : with a status, in this case it's -1 for documents that aren't ready to be : searched yet and 1 otherwise. Initial indexing works perfectly, and getting : a result set of documents with the status of -1 works as well.

Re: Query with literal quote character: 6'2"

2008-02-07 Thread Chris Hostetter
: Our users can blow up the parser without special characters. : : AND THE BAND PLAYED ON : TO HAVE AND HAVE NOT Grrr... yeah, i'd forgotten about that problem. I was hopping LUCENE-682 could solve that (by "unregistering" AND/OR/NOT as operators) but that issue fairly dead in the water si

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Fuad Efendi
I forgot to mention: default opereator is AND; DisMax. Withot URL-encoding some queries will show exceptions even with dismax. > -Original Message- > From: Fuad Efendi [mailto:[EMAIL PROTECTED] > Sent: Thursday, February 07, 2008 3:31 PM > To: solr-user@lucene.apache.org > Subject: RE: Qu

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Fuad Efendi
This query works just fine: http://www.tokenizer.org/?q=Romeo+%2B+Juliet %2B is URL-Encoded presentation of + It shows, for instance, [Romeo & Juliet] in output. > -Original Message- > From: Walter Underwood [mailto:[EMAIL PROTECTED] > Sent: Thursday, February 07, 2008 3:25 PM > To: sol

Re: Query with literal quote character: 6'2"

2008-02-07 Thread Walter Underwood
Our users can blow up the parser without special characters. AND THE BAND PLAYED ON TO HAVE AND HAVE NOT Lower-casing in the front end avoids that. We have auto-complete on titles, so the there are plenty of chances to inadvertently use special characters: Romeo + Juliet Airplane! Sh

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Fuad Efendi
I have same kind of queries correctly working on my site. It's probably because I am using URL Escaping: http://www.tokenizer.org/?q=6%272%22 > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of Yonik Seeley > Sent: Thursday, February 07, 2008 12:58 P

Re: solrcofig.xml - need some info

2008-02-07 Thread Chris Hostetter
: I am pretty new to solr. I was wondering what is this "mm" attribute in : requestHandler in solrconfig.xml and how it works. Tried to search wiki : could not find it Hmmm... yeah wiki search does mid-word matching doesn't it? the key thng to realize is that the requestHandler you were looking a

Re: Query with literal quote character: 6'2"

2008-02-07 Thread Chris Hostetter
: How about the query parser respecting backslash escaping? I need one of the orriginal design decisions was "no user escaping" ... be able to take in raw query strings from the user with only '+' '-' and '"' treated as special characters ... if you allow backslash escaping of those characters

solrcofig.xml - need some info

2008-02-07 Thread Ismail Siddiqui
I am pretty new to solr. I was wondering what is this "mm" attribute in requestHandler in solrconfig.xml and how it works. Tried to search wiki could not find it 2<-1 5<-2 6<90% thanks Ismail

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-07 Thread Andrzej Bialecki
Doug Cutting wrote: Ning, I am also interested in starting a new project in this area. The approach I have in mind is slightly different, but hopefully we can come to some agreement and collaborate. I'm interested in this too. My current thinking is that the Solr search API is the appropri

Re: uniqueKey gives duplicate values

2008-02-07 Thread Yonik Seeley
On Feb 7, 2008 2:51 PM, vijay_schi <[EMAIL PROTECTED]> wrote: > I want to know, what type of analyzers can be used for the data 12345_r, > 12346_r, 12345_c, 12346_c etc , type of data. > > I had text type for that uniqueKey and some query , index analyzers on it. i > think thats making duplicat

Re: uniqueKey gives duplicate values

2008-02-07 Thread vijay_schi
I want to know, what type of analyzers can be used for the data 12345_r, 12346_r, 12345_c, 12346_c etc , type of data. I had text type for that uniqueKey and some query , index analyzers on it. i think thats making duplicates. Yonik Seeley wrote: > > On Feb 7, 2008 2:27 PM, vijay_schi <[

Re: Query with literal quote character: 6'2"

2008-02-07 Thread Walter Underwood
Huh? Queries come in through URL parameters and this is all ASCII anyway. Even in XML, entities and UTF-8 decode to the same characters after parsing. The glyph formerly known as Prince belongs in the private use area, of course. wunder On 2/7/08 11:06 AM, "Lance Norskog" <[EMAIL PROTECTED]> wro

Re: uniqueKey gives duplicate values

2008-02-07 Thread Yonik Seeley
On Feb 7, 2008 2:27 PM, vijay_schi <[EMAIL PROTECTED]> wrote: > I'm new to solr. I have a uniqueKey on string which has the data of > 12345_r,12346_r etc etc. > when I'm posting xml with same data second time, it allows the docs to be > added. when i search for id:12345_r on solr client , i'm getti

uniqueKey gives duplicate values

2008-02-07 Thread vijay_schi
Hi, I'm new to solr. I have a uniqueKey on string which has the data of 12345_r,12346_r etc etc. when I'm posting xml with same data second time, it allows the docs to be added. when i search for id:12345_r on solr client , i'm getting multiple records. what might be the problem ? previously I'

RE: Query with literal quote character: 6'2"

2008-02-07 Thread Lance Norskog
Some people loathe UTF-8 and do all of their text in XML entities. This might work better for your punctuation needs. But it still won't help you with Prince :) -Original Message- From: Walter Underwood [mailto:[EMAIL PROTECTED] Sent: Thursday, February 07, 2008 9:25 AM To: solr-user@luc

RE: Indexing Japanese & English

2008-02-07 Thread Paul Clegg
Yes, I've seen this bit. Near as I can tell, it's what I want, so that our Japanese users can search on a double-byte character and get back results (since they don't use spaces to delineate words, it's impossible in the default solr configuration to find a single double-byte character somewhere "

Re: how to improve concurrent request performance and stress testing

2008-02-07 Thread Chris Hostetter
: Thank you so much! I will look into firstSearcher configuration next! thanks FYI: prompted by this thread, I added some blurbs about firstSearcher, newSearcher, and FieldCache to the SolrCaching wiki ... as a new users learning about this stuff, please fele free to update that wiki with any

RE: Indexing Japanese & English

2008-02-07 Thread Lance Norskog
Here are the comments for CJKTokenizer. First, is this what you want? Remember, there are three Japanese writing systems. /** * CJKTokenizer was modified from StopTokenizer which does a decent job for * most European languages. It performs other token methods for double-byte * Characters: the

Re: Query with literal quote character: 6'2"

2008-02-07 Thread Walter Underwood
How about the query parser respecting backslash escaping? I need free-text input, no syntax at all. Right now, I'm escaping every Lucene special character in the front end. I just figured out that it breaks for colon, can't search for "12:01" with "12\:01". wunder On 2/7/08 11:06 AM, "Chris Hoste

Re: Query with literal quote character: 6'2"

2008-02-07 Thread Chris Hostetter
: I confirmed this behavior in trunk with the following query: : http://localhost:8983/solr/select?qt=dismax&q=6'2"&debugQuery=on&qf=cat&pf=cat : : The result is that the double quote is dropped: : +DisjunctionMaxQuery((cat:6'2)~0.01) DisjunctionMaxQuery((cat:6'2)~0.01) : : This seems like it's

Indexing Japanese & English

2008-02-07 Thread Paul Clegg
I hate asking stupid questions immediately after joining a mailing list, but I'm in a bit of a pinch here. I'm using Solr/Tomcat for a Ruby on Rails project (acts_as_solr) and I've had a lot of success getting it working -- for English. The problem I'm running into is that our primary customer

Re: Query with literal quote character: 6'2"

2008-02-07 Thread Yonik Seeley
On Feb 7, 2008 12:24 PM, Walter Underwood <[EMAIL PROTECTED]> wrote: > We have a movie with this title: 6'2" > > I can get that string indexed, but I can't get it through the query > parser and into DisMax. It goes through the analyzers fine. I can > run the analysis tool in the admin interface and

Search result not coming for normal special characters...

2008-02-07 Thread nithyavembu
Hi All, Now i am facing problem in special character search. I tried with the following special characters (!,@,#,$,%,^,&,*,(,),{,},[,]). My indexing data is : !national! @national@ #national# $national$ %national% ^national^ &national&

Query with literal quote character: 6'2"

2008-02-07 Thread Walter Underwood
We have a movie with this title: 6'2" I can get that string indexed, but I can't get it through the query parser and into DisMax. It goes through the analyzers fine. I can run the analysis tool in the admin interface and get a match with that exact string. These variants don't work: 6'2" 6'2\" 6

Highlight on non-text fields and/or field-match list

2008-02-07 Thread jnagro
I've done some searching through the archives and google, as well as some tinkering on my own with no avail. My goal is to get a list of the fields that matched a particular query. At first, I thought highlighting was the solution however its slowly becoming clear that it doesn't do what I need it

What should be the best config for a multilingual site

2008-02-07 Thread Leonardo Santagada
I'm working for a french/english site and I want to know what filters would be nice and are recomended. Should I use 2 steamers or is there a way to mark one of them bilingual? I am using the latin-1 filter also, any more ideas? []'s -- Leonardo Santagada

Re: For an "XML" fieldtype

2008-02-07 Thread Frédéric Glorieux (École nationale des chartes)
Thanks Chris, this idea has been discussed before, most notably in this thread... http://www.nabble.com/Indexing-XML-files-to7705775.html ...as discussed there, the crux of the isue is not a special fieldtype, but a custom ResponseWriter that outputs the XML you want, and leaves any field va

Re: Search not working for indexed words...

2008-02-07 Thread nithyavembu
Thanks Yonik and Ard. Yes its the stemming problem and i have removed the ""solr.EnglishPorterFilterFactory"" from indexing and querying analyzers. Now its working fine. Is any other problem will occur if i remove this? Thanks, nithya. -- View this message in context: http://www.nabble.com/S

Re: Search not working for indexed words...

2008-02-07 Thread nithyavembu
Thanks Yonik and Ard. Yes its the stemming problem and i have removed the ""solr.EnglishPorterFilterFactory"" from indexing and querying analyzers. Now its working fine. Is any other problem will occur if i remove this? Thanks, nithya. Yonik Seeley wrote: > > It's stemming. Administrator st

Re: how to improve concurrent request performance and stress testing

2008-02-07 Thread Ziqi Zhang
Thank you so much! I will look into firstSearcher configuration next! thanks -- From: "Chris Hostetter" <[EMAIL PROTECTED]> Sent: Wednesday, February 06, 2008 8:56 PM To: Subject: Re: how to improve concurrent request performance and stress testin