Re: stemming the index

2010-07-07 Thread Jaran Nilsen
Although I have not tested it myself yet, the Lucene-Hunspell project might be worth to have a look at: http://code.google.com/p/lucene-hunspell/ Jaran On Wed, Jul 7, 2010 at 10:15 PM, sarfaraz masood < sarfarazmasood2...@yahoo.com> wrote: > Thanx Erick > :-) > > --- On Thu, 8/7/10, Erick Ericks

Re: index format error because disk full

2010-07-07 Thread Li Li
I use SegmentInfos to read the segment_N file and found the error is that it try to load deletedDocs but the .del file's size is 0(because of disk error) . So I use SegmentInfos to set delGen=-1 to ignore deleted Docs. But I think there is some bug. The logic of write my be -- it first writes the

Re: tomcat solr logs

2010-07-07 Thread Jeff Hammerbacher
Hey Robert, You may want to check out Flume for log file collection: http://github.com/cloudera/flume. We don't currently allow Flume to populate a Solr index, but that would be quite an interesting use case! Later, Jeff On Wed, Jun 30, 2010 at 3:06 PM, Robert Petersen wrote: > Sorry if this i

Re: year range field, proper data type?

2010-07-07 Thread Lance Norskog
There is no 'trie string'. If you use a trie type for this problem, sorting will take much less memory. Sorting strings uses memory both per document and per unique term. The Trie types do not use any memory per unique term. So, yes, a Trie Integer is a good choice for this problem. On Wed, Jul 7

Re: Handling Updates

2010-07-07 Thread Lance Norskog
You can pass variables to the DIH from the URL parameters. This would let you pass a query term into the DIH operation. On Wed, Jul 7, 2010 at 11:53 AM, Frank A wrote: > I'm still pretty new to SOLR and have a question about handling updates.  I > currently have a db-config to do a bulk import.  

Re: Per-user results sets

2010-07-07 Thread Lance Norskog
Yes, for a user's query you would include a different set of boosts as a parameter in the search request. It's easy. You need the user->boost set mapping in your front end, not in Solr. On Wed, Jul 7, 2010 at 8:44 AM, Jean-Michel Philippon-Nadeau wrote: > Hi list, > > I am wondering if Solr/Lucen

Re: index format error because disk full

2010-07-07 Thread Lance Norskog
If autocommit does not to an automatic rollback, that is a serious bug. There should be a way to detect that an automatic rollback has happened, but I don't know what it is. Maybe something in the Solr MBeans? On Wed, Jul 7, 2010 at 5:41 AM, osocurious2 wrote: > > I haven't used this myself, but

Re: ClassCastException SOLR

2010-07-07 Thread Lance Norskog
TokenFilterFactory is an interface. Your factory class has to implement this interface. If you look at the Lucene factories, they all subclass from BaseTokenFilterFactory which then subclasses from BaseTokenStreamFactory. That last one does various things for the child factories (I don't know wha

Using hl.regex.pattern to print complete lines

2010-07-07 Thread Peter Spam
Hi, I have a text file broken apart by carriage returns, and I'd like to only return entire lines. So, I'm trying to use this: &hl.fragmenter=regex &hl.regex.pattern=^.*$ ... but I still get fragments, even if I crank up the hl.regex.slop to 3 or so. I also tried a pattern of

Re: Diiferences in avgRequestsPerSecond of Solr ..

2010-07-07 Thread Chris Hostetter
: I am fectching the following details programatically : 1) you didn't tell us how you were fetching those detials programatically .. what URL are you using? 2) he fact that the handlerStart times are different suggests that you are not looking at the same handler (maybe you are looking at tw

Re: Solr Cloud/ Solr integration with zookeeper

2010-07-07 Thread Chris Hostetter
: with multicore. i cannot access: : http://localhost:8983/solr/collection1/admin/zookeeper.jsp why would you expect that URL to work? you don't have a core named "collection1" in the solr.xml you posted... : : : ...the only time "collection1" appears is as the defaultCoreName, but unless

Re: indexing xml document with literals

2010-07-07 Thread Chris Hostetter
: Does anyone know how to read in data from one or more of the example xml docs : and ALSO store the filename and path from which it came? Solr has no knowledge that your "xml docs" are actually files ... the XML syntax ("...") is just a serialization mechanism for streaming data to solr about

Re: Solrj throws RuntimeException - Invalid version or the data is not in javabin format

2010-07-07 Thread Chris Hostetter
: Ubuntu server (see exception below). The same configuration works when : injecting from a Windows client to a Windows server. interesting ... so you're saying that if you use the exact same SolrJ code, and just change the host:port, it works on windows? are you certian that the version of So

Re: terms separated by -

2010-07-07 Thread Erick Erickson
Take a look at WordDelimiterFilterFactory Erick On Wed, Jul 7, 2010 at 4:15 PM, sarfaraz masood < sarfarazmasood2...@yahoo.com> wrote: > There are terms in my data like : one-way , separated by '-' , now the > problem is that the standard analyzer is considering these as a single term > inst

Re: stemming the index

2010-07-07 Thread sarfaraz masood
Thanx Erick :-) --- On Thu, 8/7/10, Erick Erickson wrote: From: Erick Erickson Subject: Re: stemming the index To: solr-user@lucene.apache.org Date: Thursday, 8 July, 2010, 1:33 AM The short answer is "there isn't a single analyzer and stemmer that really work well for mixed-language indexing

terms separated by -

2010-07-07 Thread sarfaraz masood
There are terms in my data like : one-way , separated by '-' , now the problem is that the standard analyzer is considering these as a single term instead of two. but i need that these should be stored as two terms in the index.. but how to do this ?? Sarfaraz

Re: stemming the index

2010-07-07 Thread Erick Erickson
The short answer is "there isn't a single analyzer and stemmer that really work well for mixed-language indexing and searching". Take a look through the mail archive, try search for multilanguage or multi-language or multiple languages. There's a wealth of info there because this topic has been di

Re: year range field, proper data type?

2010-07-07 Thread Erick Erickson
This isn't a very worrisome case. Most of the messages you see on the board about the dangers of dates arise because dates can be stored with many unique values if they include milliseconds. Then, when sorting on date your memory explodes because all the dates are loaded into memory. In your case,

stemming the index

2010-07-07 Thread sarfaraz masood
My index contains data of 2 different languages, English & German. Now which analyzer & stemmer should be applied on this data before feeding to index -Sarfaraz

Re: Wildcards queries

2010-07-07 Thread Erick Erickson
You need to look carefully at your schema.xml. There are plenty of comments in that file describing what's going on. That's where you set up your analyzers by chaining together various tokenizers and filters. I think you're confused about indexing and storing. Generally it's a bad practice to allo

Re: Adding new elements to index

2010-07-07 Thread Erick Erickson
Hmmm, let's see your schema definitions please. I'm suspicious because you've implied that you do use a unique key. If it's required, then your definitions don't select it into the same name (i.e. you select as id_carrer in one and id_hidrant in another). So if id_hidrant was defined as your unique

Handling Updates

2010-07-07 Thread Frank A
I'm still pretty new to SOLR and have a question about handling updates. I currently have a db-config to do a bulk import. I have a single root entity and then some data that comes from other tables. This works fine for an initial bulk load. However, once indexed, is there a way I can tell SOLR

Re: Modifications to AbstractSubTypeFieldType

2010-07-07 Thread Yonik Seeley
On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll wrote: > Originally, I had intended that it was just for one Field Sub Type, thinking > that if we ever wanted multiple sub types, that a new, separate class would > be needed Right - this was my original thinking too. AbstractSubTypeFieldType i

Per-user results sets

2010-07-07 Thread Jean-Michel Philippon-Nadeau
Hi list, I am wondering if Solr/Lucene can help improve my existing search engine. I would like to have different results for each user - but still have relevant results. Each user would have different score multipliers for each searchable item. Is this something possible? Thanks, -- Jean-Mic

year range field, proper data type?

2010-07-07 Thread Jonathan Rochkind
So I will have a solr field that contains "years", ie, "1990", "2010", maybe even "1492", "1209" and "907"/"0907". I will be doing range limits over this field. Ie, [1950 TO 1975] or what have you. The data represents publication dates of books on a large library shelves; there will be aroun

Re: Modifications to AbstractSubTypeFieldType

2010-07-07 Thread Mark Allan
Currently our only requirement is to be able to search on the numerical part of the daterange field, so our field type overrides getRangeQuery and getFieldQuery to consider only the first two subfields. If we wanted to be able to search the name subfield as well, I suppose we could do some

Re: index format error because disk full

2010-07-07 Thread osocurious2
I haven't used this myself, but Solr supports a http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 rollback function. It is supposed to rollback to the state at the previous commit. So you may want to turn off auto-commit on the index you are updating if you want to control what that

ClassCastException SOLR

2010-07-07 Thread Martin Kysel
Hi, I am trying to make a Lucene module for SKOS-based synonym expansion. As I wanted to implement the Filter in SOLR, I get a ClassCastException. So I tried to take one of the existing SOLR Filters and FilterFactories, change the package information, compress into a jar and use it as a plugin.

Re: Modifications to AbstractSubTypeFieldType

2010-07-07 Thread Grant Ingersoll
This looks reasonable. I'll take a look at the patch. Originally, I had intended that it was just for one Field Sub Type, thinking that if we ever wanted multiple sub types, that a new, separate class would be needed, but if this proves to be clean this way, then I see no reason not to incorpo

Re: Adding new elements to index

2010-07-07 Thread Alexey Serba
1) Shouldn't you put your "entity" elements under "document" tag, i.e. ... ... 2) What happens if you try to run full-import with explicitly specified "entity" GET parameter? command=full-import&entity=carrers command=full-import&entity=hidrants On Wed, Jul 7, 2010 at 11:1

Re: DatImportHandler and cron issue

2010-07-07 Thread Govind Kanshi
How did you verify it was not processed? Did you 1. Query for docs - with no results 2. Use Solr Admin tool? 3. Bypass data import handler and see if the doc post/commit works. On Tue, Jun 15, 2010 at 10:29 PM, iboppana wrote: > > Hi All, > > We are trying implement solr for our newspapers site

Huge pages

2010-07-07 Thread Glen Newton
I was wondering if anyone has any experience using huge pages[1] to improve SOLR (or Lucene) performance (esp on 64bit). Some are reporting major performance gains in large, memory intense applications (like EJBs)[2]. Also, ephemeral but significant performance reductions have also been solved usin

Re: Adding new elements to index

2010-07-07 Thread Govind Kanshi
Just for testing purpose - I would 1. Use curl to create new docs 2. Use Solrj to go to individual dbs and collect docs. On Wed, Jul 7, 2010 at 12:45 PM, Xavier Rodriguez wrote: > Thanks for the quick reply! > > In fact it was a typo, the 200 rows I got were from postgres. I tried to > say > t

Re: Adding new elements to index

2010-07-07 Thread Xavier Rodriguez
Thanks for the quick reply! In fact it was a typo, the 200 rows I got were from postgres. I tried to say that the full-import was omitting the 100 oracle rows. When I run the full import, I run it as a single job, using the url command=full-import. I've tried to clear the index both using the cle