sint and omitnorms

2008-10-10 Thread sanraj25

Hi,
  I create own field name using integer field type and sint field
type(solr.SortableIntField) in schema.xml.
i can't differentiate between these two field type. When this sint exactly
use? If we use sint how it is sortable? I test by {sort =field name} in
query window .but it's not work properly.I have one more question.What is
the purpose of omitNorms attribute?If we use omitNorms what will happen? 
please tell me with clear example
thanks in advance

-sanraj 
-- 
View this message in context: 
http://www.nabble.com/sint-and-omitnorms-tp19912537p19912537.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.3 list of language managed org.apache.lucene.analysis

2008-10-10 Thread sunnyfr

Hi Hoss, Hi everybody,

Can you tell me more about fsync, where can I get more information. Another
question cuz yes now I don't have anymore OOME but my point is, it always
stuck the database and it locks other service on this database.
Everytime I will start a full import. This night database sent alerte, it
started to stack other request on it.

How can I do, to apply this request little pack by little pack. ??
I'm using MySql.

How can I do, if for example I have to add a new language in the schema or
change add a field.
Do I have to make a new full index, I read everywhere re index, is it the
same as full index?
How can I manage that?

Thanks a lot,






hossman wrote:
 
 
 : I'm using solr1.3 and I would like to know where can I find a place
 where
 : you have the list of the language managed by solr :
 : like for greek in the example :
 org.apache.lucene.analysis.el.GreekAnalyze.
 
 There isn't an explicitly list of langauges supported -- but if you look 
 at the javadocs, both for Solr and Lucene, you can get a very good sense 
 of what Tokenizers, TokenFilters, and Analyzers are included with Solr.
 
 There *may* be a few Analayzers in Lucene contribs which are not in Solr 
 OOTB, but they should be fairly easy to add as plugins...
 
 http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-tree.html
 
 Keep in mind some Analysis classes (like SnowballPorterFilterFactory) 
 actually support many different langauges based on runtime configuration.
 
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-1.3-list-of-language-managed-org.apache.lucene.analysis-tp19902137p19913237.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How stop properly solr to modify solrconfig or ... files

2008-10-10 Thread sunnyfr

Hi Hoss, Mark, everybody,

Can you tell me more about fsync, where can I get more information. Another
question cuz yes now I don't have anymore OOME but my point is, it always
stuck the database and it locks other service on this database.
Everytime I will start a full import. This night database sent alerte, it
started to stack other request on it.

How can I do, to apply this request little pack by little pack. ??
I'm using MySql.

How can I do, if for example I have to add a new language in the schema or
change add a field.
Do I have to make a new full index, I read everywhere re index, is it the
same as full index?
How can I manage that?

Thanks a lot,





markrmiller wrote:
 
 Chris Hostetter wrote:
 i thought the fsync additions to the version of Lucene in Solr 1.3 
 prevented situations like this (ensuring that your index was still
 usable, 
 even if some documents were lost) 
   
 It doesn't prevent it if the IO system has write caching enabled or if 
 you have a hard drive that lies to fsync to improve benchmark scores 
 (most consumers hard drives I believe).
 
 - Mark
 
 

-- 
View this message in context: 
http://www.nabble.com/How-stop-properly-solr-to-modify-solrconfig-or-...-files-tp19826679p19913251.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.3 list of language managed org.apache.lucene.analysis

2008-10-10 Thread sunnyfr

Thanks Hoss,
:)


hossman wrote:
 
 
 : I'm using solr1.3 and I would like to know where can I find a place
 where
 : you have the list of the language managed by solr :
 : like for greek in the example :
 org.apache.lucene.analysis.el.GreekAnalyze.
 
 There isn't an explicitly list of langauges supported -- but if you look 
 at the javadocs, both for Solr and Lucene, you can get a very good sense 
 of what Tokenizers, TokenFilters, and Analyzers are included with Solr.
 
 There *may* be a few Analayzers in Lucene contribs which are not in Solr 
 OOTB, but they should be fairly easy to add as plugins...
 
 http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-tree.html
 
 Keep in mind some Analysis classes (like SnowballPorterFilterFactory) 
 actually support many different langauges based on runtime configuration.
 
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-1.3-list-of-language-managed-org.apache.lucene.analysis-tp19902137p19914166.html
Sent from the Solr - User mailing list archive at Nabble.com.



Buzz measurement - Aggregate functions

2008-10-10 Thread Marcus Herou
Hi.

Anyone have an idea of how I would create a query which finds the data
backing a trend graph where date is X and num(docs) is on Y axis ?

This is quite a common use case in buzz analysis and currently I'm doing a
stupid query which iterates over the date range and queries lucene for every
date. Not very fast and not very flexible.

More specifically something like this but I need to add free text query as
well and then I cannot use MySQL for performance reasons. Any ideas ?

--clip--
mysql select count(id) as Y,publishDate as X from FeedItem where
publishDate between 2008-08-01 and 2008-08-31 group by DAY(publishDate)
order by publishDate asc;
+---+-+
| Y | X   |
+---+-+
| 26663 | 2008-08-01 00:00:00 |
| 22478 | 2008-08-02 00:00:00 |
| 25745 | 2008-08-03 00:00:00 |
| 30576 | 2008-08-04 00:00:00 |
| 31351 | 2008-08-05 00:00:00 |
| 31084 | 2008-08-06 00:00:00 |
| 31245 | 2008-08-07 00:00:00 |
| 29518 | 2008-08-08 00:00:00 |
| 26001 | 2008-08-09 00:00:00 |
| 28687 | 2008-08-10 00:00:00 |
| 32957 | 2008-08-11 00:00:00 |
| 33251 | 2008-08-12 00:00:00 |
| 33062 | 2008-08-13 00:00:00 |
| 33960 | 2008-08-14 00:00:00 |
| 31034 | 2008-08-15 00:00:00 |
| 26726 | 2008-08-16 00:00:00 |
| 27543 | 2008-08-17 00:00:00 |
| 36887 | 2008-08-18 00:00:00 |
| 35376 | 2008-08-19 00:00:00 |
| 34573 | 2008-08-20 00:00:00 |
| 33889 | 2008-08-21 00:00:00 |
| 30604 | 2008-08-22 00:00:00 |
| 26875 | 2008-08-23 00:00:00 |
| 27356 | 2008-08-24 00:00:00 |
| 33438 | 2008-08-25 00:00:00 |
| 33102 | 2008-08-26 00:00:00 |
| 31720 | 2008-08-27 00:00:00 |
| 26133 | 2008-08-28 00:00:00 |
| 22781 | 2008-08-29 00:00:00 |
| 20198 | 2008-08-30 00:00:00 |
|20 | 2008-08-31 00:00:00 |
+---+-+


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com/
http://blogg.tailsweep.com/



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com/
http://blogg.tailsweep.com/


Re: Buzz measurement - Aggregate functions

2008-10-10 Thread Uri Boness
you can try using the field collapse patch (currently in JIRA). You'll 
probably need to manually extract the patch code and apply it yourself 
as its latest update only applies to an earlier version of solr (1.3-dev).


http://issues.apache.org/jira/browse/SOLR-236

Cheers,
Uri

Marcus Herou wrote:

Hi.

Anyone have an idea of how I would create a query which finds the data
backing a trend graph where date is X and num(docs) is on Y axis ?

This is quite a common use case in buzz analysis and currently I'm doing a
stupid query which iterates over the date range and queries lucene for every
date. Not very fast and not very flexible.

More specifically something like this but I need to add free text query as
well and then I cannot use MySQL for performance reasons. Any ideas ?

--clip--
mysql select count(id) as Y,publishDate as X from FeedItem where
publishDate between 2008-08-01 and 2008-08-31 group by DAY(publishDate)
order by publishDate asc;
+---+-+
| Y | X   |
+---+-+
| 26663 | 2008-08-01 00:00:00 |
| 22478 | 2008-08-02 00:00:00 |
| 25745 | 2008-08-03 00:00:00 |
| 30576 | 2008-08-04 00:00:00 |
| 31351 | 2008-08-05 00:00:00 |
| 31084 | 2008-08-06 00:00:00 |
| 31245 | 2008-08-07 00:00:00 |
| 29518 | 2008-08-08 00:00:00 |
| 26001 | 2008-08-09 00:00:00 |
| 28687 | 2008-08-10 00:00:00 |
| 32957 | 2008-08-11 00:00:00 |
| 33251 | 2008-08-12 00:00:00 |
| 33062 | 2008-08-13 00:00:00 |
| 33960 | 2008-08-14 00:00:00 |
| 31034 | 2008-08-15 00:00:00 |
| 26726 | 2008-08-16 00:00:00 |
| 27543 | 2008-08-17 00:00:00 |
| 36887 | 2008-08-18 00:00:00 |
| 35376 | 2008-08-19 00:00:00 |
| 34573 | 2008-08-20 00:00:00 |
| 33889 | 2008-08-21 00:00:00 |
| 30604 | 2008-08-22 00:00:00 |
| 26875 | 2008-08-23 00:00:00 |
| 27356 | 2008-08-24 00:00:00 |
| 33438 | 2008-08-25 00:00:00 |
| 33102 | 2008-08-26 00:00:00 |
| 31720 | 2008-08-27 00:00:00 |
| 26133 | 2008-08-28 00:00:00 |
| 22781 | 2008-08-29 00:00:00 |
| 20198 | 2008-08-30 00:00:00 |
|20 | 2008-08-31 00:00:00 |
+---+-+


  




Re: sint in schema.xml

2008-10-10 Thread bburke71
U
--Original Message--
From: sanraj25
To: solr-user@lucene.apache.org
ReplyTo: solr-user@lucene.apache.org
Sent: Oct 9, 2008 10:13 PM
Subject: Re: sint in schema.xml


Hi,
  I create own field name using integer field type and sint field
type(solr.SortableIntField) in schema.xml.
i can't differentiate between these two field type. When this sint exactly
use? If we use sint how it is sortable?



sanraj25 wrote:
 
 Hi,
   I create own field name using integer field type and sint field
 type(solr.SortableIntField) in schema.xml.
 i can't differentiate between these two field type. When this sint exactly
 use? If we use sint how it is sortable? I test by {sort =field name} in
 query window .but it's not work properly.please tell me with clear example
 thanks in advance
 
 -sanraj
 
 

-- 
View this message in context: 
http://www.nabble.com/sint-in-schema.xml-tp19900303p19911165.html
Sent from the Solr - User mailing list archive at Nabble.com.



Sent via BlackBerry by ATT

Need help with DictionaryCompoundWordTokenFilterFactory

2008-10-10 Thread Kraus, Ralf | pixelhouse GmbH

Hi,

I am trying to solve the typical german Donaudampfschiff- problem by 
using the DictionaryCompoundWordTokenFilter ...
Anyone can show me how to configure my schema.xml to use the 
DictionaryCompoundWordTokenFilterFactory ???


Greets -Ralf-


Re: Buzz measurement - Aggregate functions

2008-10-10 Thread Marcus Herou
Man I friend made me realize that facets will do the exact thing I want!

Example:

GET 
http://192.168.10.12:8110/solr/feedItem/select/?indent=trueq=title:test*rows=0facet=truefacet.date=publishDatefacet.date.start=NOW/DAY-30DAYSfacet.date.end=NOW/DAY%2B1DAYfacet.date.gap=%2B1DAY

?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
 int name=status0/int
 int name=QTime22/int
 lst name=params
  str name=facet.date.startNOW/DAY-30DAYS/str
  str name=facettrue/str
  str name=indenttrue/str
  str name=qtitle:test*/str
  str name=facet.datepublishDate/str
  str name=facet.date.gap+1DAY/str
  str name=facet.date.endNOW/DAY+1DAY/str
  str name=rows0/str
 /lst
/lst
result name=response numFound=14814 start=0/
lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields/
 lst name=facet_dates
  lst name=publishDate
int name=2008-09-10T00:00:00Z52/int
int name=2008-09-11T00:00:00Z57/int
int name=2008-09-12T00:00:00Z34/int
int name=2008-09-13T00:00:00Z38/int
int name=2008-09-14T00:00:00Z32/int
int name=2008-09-15T00:00:00Z53/int
int name=2008-09-16T00:00:00Z38/int
int name=2008-09-17T00:00:00Z53/int
int name=2008-09-18T00:00:00Z64/int
int name=2008-09-19T00:00:00Z49/int
int name=2008-09-20T00:00:00Z30/int
int name=2008-09-21T00:00:00Z37/int
int name=2008-09-22T00:00:00Z43/int
int name=2008-09-23T00:00:00Z37/int
int name=2008-09-24T00:00:00Z40/int
int name=2008-09-25T00:00:00Z21/int
int name=2008-09-26T00:00:00Z23/int
int name=2008-09-27T00:00:00Z21/int
int name=2008-09-28T00:00:00Z34/int
int name=2008-09-29T00:00:00Z43/int
int name=2008-09-30T00:00:00Z50/int
int name=2008-10-01T00:00:00Z39/int
int name=2008-10-02T00:00:00Z49/int
int name=2008-10-03T00:00:00Z25/int
int name=2008-10-04T00:00:00Z22/int
int name=2008-10-05T00:00:00Z15/int
int name=2008-10-06T00:00:00Z28/int
int name=2008-10-07T00:00:00Z4/int
int name=2008-10-08T00:00:00Z0/int
int name=2008-10-09T00:00:00Z0/int
int name=2008-10-10T00:00:00Z0/int
str name=gap+1DAY/str
date name=end2008-10-11T00:00:00Z/date
  /lst
 /lst
/lst
/response



On Fri, Oct 10, 2008 at 11:42 AM, Uri Boness [EMAIL PROTECTED] wrote:

 you can try using the field collapse patch (currently in JIRA). You'll
 probably need to manually extract the patch code and apply it yourself as
 its latest update only applies to an earlier version of solr (1.3-dev).

 http://issues.apache.org/jira/browse/SOLR-236

 Cheers,
 Uri


 Marcus Herou wrote:

 Hi.

 Anyone have an idea of how I would create a query which finds the data
 backing a trend graph where date is X and num(docs) is on Y axis ?

 This is quite a common use case in buzz analysis and currently I'm doing
 a
 stupid query which iterates over the date range and queries lucene for
 every
 date. Not very fast and not very flexible.

 More specifically something like this but I need to add free text query as
 well and then I cannot use MySQL for performance reasons. Any ideas ?

 --clip--
 mysql select count(id) as Y,publishDate as X from FeedItem where
 publishDate between 2008-08-01 and 2008-08-31 group by
 DAY(publishDate)
 order by publishDate asc;
 +---+-+
 | Y | X   |
 +---+-+
 | 26663 | 2008-08-01 00:00:00 |
 | 22478 | 2008-08-02 00:00:00 |
 | 25745 | 2008-08-03 00:00:00 |
 | 30576 | 2008-08-04 00:00:00 |
 | 31351 | 2008-08-05 00:00:00 |
 | 31084 | 2008-08-06 00:00:00 |
 | 31245 | 2008-08-07 00:00:00 |
 | 29518 | 2008-08-08 00:00:00 |
 | 26001 | 2008-08-09 00:00:00 |
 | 28687 | 2008-08-10 00:00:00 |
 | 32957 | 2008-08-11 00:00:00 |
 | 33251 | 2008-08-12 00:00:00 |
 | 33062 | 2008-08-13 00:00:00 |
 | 33960 | 2008-08-14 00:00:00 |
 | 31034 | 2008-08-15 00:00:00 |
 | 26726 | 2008-08-16 00:00:00 |
 | 27543 | 2008-08-17 00:00:00 |
 | 36887 | 2008-08-18 00:00:00 |
 | 35376 | 2008-08-19 00:00:00 |
 | 34573 | 2008-08-20 00:00:00 |
 | 33889 | 2008-08-21 00:00:00 |
 | 30604 | 2008-08-22 00:00:00 |
 | 26875 | 2008-08-23 00:00:00 |
 | 27356 | 2008-08-24 00:00:00 |
 | 33438 | 2008-08-25 00:00:00 |
 | 33102 | 2008-08-26 00:00:00 |
 | 31720 | 2008-08-27 00:00:00 |
 | 26133 | 2008-08-28 00:00:00 |
 | 22781 | 2008-08-29 00:00:00 |
 | 20198 | 2008-08-30 00:00:00 |
 |20 | 2008-08-31 00:00:00 |
 +---+-+








-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com/
http://blogg.tailsweep.com/


Re: solr 1.3 list of language managed org.apache.lucene.analysis

2008-10-10 Thread sunnyfr

Thanks Hoss, 

But what about SnowballPorterFilterFactory, which language are took in
consideration ?
Italian, Dutch, Portuguese ? What else?
Thanks



hossman wrote:
 
 
 : I'm using solr1.3 and I would like to know where can I find a place
 where
 : you have the list of the language managed by solr :
 : like for greek in the example :
 org.apache.lucene.analysis.el.GreekAnalyze.
 
 There isn't an explicitly list of langauges supported -- but if you look 
 at the javadocs, both for Solr and Lucene, you can get a very good sense 
 of what Tokenizers, TokenFilters, and Analyzers are included with Solr.
 
 There *may* be a few Analayzers in Lucene contribs which are not in Solr 
 OOTB, but they should be fairly easy to add as plugins...
 
 http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-tree.html
 
 Keep in mind some Analysis classes (like SnowballPorterFilterFactory) 
 actually support many different langauges based on runtime configuration.
 
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-1.3-list-of-language-managed-org.apache.lucene.analysis-tp19902137p19920374.html
Sent from the Solr - User mailing list archive at Nabble.com.



DateField

2008-10-10 Thread sanraj25

Hi,
I created one field name using date field, with default=NOW .Then
I index many documents.Now 
my question is what is the purpose  of NOW-1DAY,NOW-1HOUR these values.How
we use the Datefield efficiently.(Or) can we use this filed while searching
? (Or) can we pass this parameters while sending query?

Thanks in Advance

-sanraj
-- 
View this message in context: 
http://www.nabble.com/DateField-tp19914252p19914252.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: sint in schema.xml

2008-10-10 Thread Francisco Sanmartin
In lucene, all the data is stored as strings, so if you have a field 
defined as integer or sint, in lucene are strings, and if you try to 
sort numbers represented as strings what happens is this:


example numbers: 1,2,3,4,5,6,7,8,9,10,11,12,13.
ordered as strings:1,10,11,12,13,2,3,4,5,6,7,8,9

That's why there is a field type called sint, that means sortable int. 
Is the same as int with the difference that it will order the numbers 
properly.


example numbers: 1,2,3,4,5,6,7,8,9,10,11,12,13.
order being int:1,10,11,12,13,2,3,4,5,6,7,8,9
order being sint:  1,2,3,4,5,6,7,8,9,10,11,12,13.

Pako

[EMAIL PROTECTED] wrote:

U
--Original Message--
From: sanraj25
To: solr-user@lucene.apache.org
ReplyTo: solr-user@lucene.apache.org
Sent: Oct 9, 2008 10:13 PM
Subject: Re: sint in schema.xml


Hi,
  I create own field name using integer field type and sint field
type(solr.SortableIntField) in schema.xml.
i can't differentiate between these two field type. When this sint exactly
use? If we use sint how it is sortable?



sanraj25 wrote:
  

Hi,
  I create own field name using integer field type and sint field
type(solr.SortableIntField) in schema.xml.
i can't differentiate between these two field type. When this sint exactly
use? If we use sint how it is sortable? I test by {sort =field name} in
query window .but it's not work properly.please tell me with clear example
thanks in advance

-sanraj





  




Re: solr 1.3 list of language managed org.apache.lucene.analysis

2008-10-10 Thread Erik Hatcher


On Oct 10, 2008, at 11:18 AM, sunnyfr wrote:

But what about SnowballPorterFilterFactory, which language are took in
consideration ?
Italian, Dutch, Portuguese ? What else?


See here:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-b80fb581f4e078142c694014f1a8f60c0935e080 



Erik



Re: Solr indexing not taking all values from DB.

2008-10-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
The DIH status says 10 rows which means only 10 rows got fetched for
that query. Do you have any custom transformers which eats up rows?

Try the debug page of DIH and see what is happening to the rest of the rows.



On Fri, Oct 10, 2008 at 5:32 PM, con [EMAIL PROTECTED] wrote:

 A simple question:
 I performed the following steps to index data from a oracle db to solr index
 and then search:
 a) I have the configurations for indexing data from a oracle db
 b) started the server.
 c) Done a full-import:
 http://localhost:8983/solr/dataimport?command=full-import

 But when I do a search using http://localhost:8983/solr/select/?q=
 Not all the result sets that matches the search string are displayed.

 1) Is the above steps enough for getting db values to solr index?
 My configurations (data-config.xml and schema.xml )are quite correct because
 I am getting SOME of the result sets as search result(not all).
 2) Is there some value in sorconfig.xml, or some other files that limits the
 number of items being indexed? [For the time being I have only a few
 hundreds of records in my db. ]
 The query that I am specifying in data-config yields around 25 results if i
 execute it in a oracle client, where as the status of full-import is
 something like:
 str name=statusidle/str
 str name=importResponseConfiguration Re-loaded sucessfully/str
 lst name=statusMessages
str name=Total Requests made to DataSource1/str
str name=Total Rows Fetched10/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2008-10-10 17:29:03/str
str name=Time taken 0:0:0.513/str
 /lst



 --
 View this message in context: 
 http://www.nabble.com/Solr-indexing-not-taking-all-values-from-DB.-tp19916938p19916938.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: spellcheck: issues

2008-10-10 Thread Jason Rennie
Ah, now I see.  Results are always sorted first by the edit distance, then
by the popularity.  What I think would work even better than allowing a
custom compareTo function would be to incorporate the frequency directly
into the distance function.  This would allow for greater control over the
trade-off between frequency and edit distance.  I'll file a jira and look at
submitting a patch.

Cheers,

Jason

On Thu, Oct 9, 2008 at 9:22 AM, Grant Ingersoll [EMAIL PROTECTED] wrote:

 Sorting in the SpellChecker is handled by the SuggestWord.compareTo()
 method in Lucene.  It looks like:
 public final int compareTo(SuggestWord a) {
// first criteria: the edit distance
if (score  a.score) {
  return 1;
}
if (score  a.score) {
  return -1;
}

// second criteria (if first criteria is equal): the popularity
if (freq  a.freq) {
  return 1;
}

if (freq  a.freq) {
  return -1;
}
return 0;
  }

 I could see you opening a JIRA issue in Lucene against the SC to make it so
 that the sorting could be overridden/pluggable.  A patch to do so would be
 even better ;-)

 Cheers,
 Grant




-- 
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http://www.stylefeeder.com/


local solr?

2008-10-10 Thread Robert Najlis
Hi, I have been looking at local solr, and I was wondering about the state
of integration with Solr.  Do you have any idea of when local solr might be
integrated in with Solr, or is this still an open question?

From looking at previous posts, it looked like part of the problem had to do
with GeoTools being LGPL.   If that is the case, would OpenMap provide a
solution to this problem?  It is also a Java based mapping platform, that I
think would work better with the Apache license.

Thanks.

Robert


RE: Need help with DictionaryCompoundWordTokenFilterFactory

2008-10-10 Thread Steven A Rowe
Hi Ralf,

On 10/10/2008 at 10:57 AM, Kraus, Ralf | pixelhouse GmbH wrote:
 I am trying to solve the typical german Donaudampfschiff-
 problem by using the DictionaryCompoundWordTokenFilter ...
 Anyone can show me how to configure my schema.xml to use the
 DictionaryCompoundWordTokenFilterFactory ???

Minimally, add the following inside the analyzer section for your field type:

filter class=solr.DictionaryCompoundWordTokenFilterFactory
dictFile=/path/to/your/dictionary /

You can also add the following (optional) attributes:

  - minWordSize (default: 5)
  - minSubwordSize (default: 2)
  - maxSubwordSize (default: 15)
  - onlyLongestMatch (default: true)

FYI, the compound package summary in the nightly trunk Lucene contrib javadocs 
has some useful information:

http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-analyzers/org/apache/lucene/analysis/compound/package-summary.html

Steve



Re: sint in schema.xml

2008-10-10 Thread Chris Hostetter

: as integer or sint, in lucene are strings, and if you try to sort numbers
: represented as strings what happens is this:

Field sorting in Solr on both IntField and SortableIntField should work 
becuase they both use the 'integer' FieldCache under the covers -- but 
where you'll really see a difference is in Range Queries (it's 
admittedly a slightly confusing aspect of hte name, but it's spelled 
out fairly well in the example schema.xml) ...

!-- numeric field types that store and index the text
 value verbatim (and hence don't support range queries, since the
 lexicographic ordering isn't equal to the numeric ordering) --
fieldType name=integer class=solr.IntField omitNorms=true/
...
!-- Numeric field types that manipulate the value into
 a string value that isn't human-readable in its internal form,
 but with a lexicographic ordering the same as the numeric ordering,
 so that range queries work correctly. --
fieldType name=sint class=solr.SortableIntField sortMissingLast=true 
omitNorms=true/

:   use? If we use sint how it is sortable? I test by {sort =field name} in
:   query window .but it's not work properly.please tell me with clear example
:   thanks in advance

can you elaborate on what exactly you have tried, and what you mean by 
it's not work properly ?


-Hoss



Re: sint and omitnorms

2008-10-10 Thread Chris Hostetter

: query window .but it's not work properly.I have one more question.What is
: the purpose of omitNorms attribute?If we use omitNorms what will happen? 
: please tell me with clear example
: thanks in advance

as before, the example schema.xml explains this in some detail...

 omitNorms: (expert) set to true to omit the norms associated with
   this field (this disables length normalization and index-time
   boosting for the field, and saves some memory).  Only full-text
   fields or fields that need an index-time boost need norms.



-Hoss



Best way to prevent max warmers error

2008-10-10 Thread sundar shankar
Hi,
 We have an application with more 2.5 million docs currently. It is hosted 
on a single box with 8 GIG memory. The number of warmers configured are 4 and 
Cold-searcher is allowed too. The application is based on data entry and commit 
to data happens as often as a data is entered. We optimize every night. When 
lots of users seem to be accessing the application in parallel, we are seeing 
the application running out of warmers and throwing an exception. 
Is there any suggestion on how we can handle it. 

Number of concurrent users currently are about 8 and will be growing to 40 soon 
and more a little later than that. I remember discussion where people have 
advised against using more warmers and said 2-4 should be more than enough for 
applications of the size of mine. I am not sure what has to be done. Please 
advice.

Regards
Sundar

_
Searching for weekend getaways? Try Live.com
http://www.live.com/?scope=videoform=MICOAL

Re: local solr?

2008-10-10 Thread Ryan McKinley

check:
https://issues.apache.org/jira/browse/LUCENE-1387

My progress has stumbled since I could not get the tests to work...  I  
am currently not using this in my own projects, so i'm not yet  
comfortable pushing to finish it.  If you get it up and running with  
success, that could get the ball rolling.


The use of GeoTools was limited to an accurate great circle  
calculation -- for now, this is replaced with a 'world as ellipse'  
approximation, but in the future we could plug in other distance  
calculations.


ryan


On Oct 10, 2008, at 1:00 PM, Robert Najlis wrote:

Hi, I have been looking at local solr, and I was wondering about the  
state
of integration with Solr.  Do you have any idea of when local solr  
might be

integrated in with Solr, or is this still an open question?

From looking at previous posts, it looked like part of the problem  
had to do
with GeoTools being LGPL.   If that is the case, would OpenMap  
provide a
solution to this problem?  It is also a Java based mapping platform,  
that I

think would work better with the Apache license.

Thanks.

Robert




Solr has limit to number of returned results?

2008-10-10 Thread Choi, David
Hi everyone, I have a (hopefully) basic question..

Does solr have a max. limit on the number of returned results?
I get the following error: HTTP Status 500 - maxClauseCount is set to 1024 
org.apache.lucene.search.BooleanQuery$TooManyClauses when I do a query that 
essentially amounts to asking for q=*

thanks
- David Choi



Re: Solr has limit to number of returned results?

2008-10-10 Thread Alok Dhir
clauses in not the results -- its what you're sending in as the  
query.  apparently it's larger than 1024 clauses...


On Oct 10, 2008, at 5:41 PM, Choi, David wrote:


Hi everyone, I have a (hopefully) basic question..

Does solr have a max. limit on the number of returned results?
I get the following error: HTTP Status 500 - maxClauseCount is set  
to 1024 org.apache.lucene.search.BooleanQuery$TooManyClauses when I  
do a query that essentially amounts to asking for q=*


thanks
- David Choi





Re: local solr?

2008-10-10 Thread Robert Najlis
Thanks,

looks like we need to wait for the tests to be updated...

I would not mind putting some time into this, but I would not no where to
begin with the test, so I will wait on that.

Was geotools the only LGPL piece, or are there other licensing issues?

thanks

Robert






On Fri, Oct 10, 2008 at 3:04 PM, Ryan McKinley [EMAIL PROTECTED] wrote:

 check:
 https://issues.apache.org/jira/browse/LUCENE-1387

 My progress has stumbled since I could not get the tests to work...  I am
 currently not using this in my own projects, so i'm not yet comfortable
 pushing to finish it.  If you get it up and running with success, that could
 get the ball rolling.

 The use of GeoTools was limited to an accurate great circle calculation --
 for now, this is replaced with a 'world as ellipse' approximation, but in
 the future we could plug in other distance calculations.

 ryan



 On Oct 10, 2008, at 1:00 PM, Robert Najlis wrote:

  Hi, I have been looking at local solr, and I was wondering about the state
 of integration with Solr.  Do you have any idea of when local solr might
 be
 integrated in with Solr, or is this still an open question?

 From looking at previous posts, it looked like part of the problem had to
 do
 with GeoTools being LGPL.   If that is the case, would OpenMap provide a
 solution to this problem?  It is also a Java based mapping platform, that
 I
 think would work better with the Apache license.

 Thanks.

 Robert





SOLR query times

2008-10-10 Thread Sammy Yu
Hi,
   I'm using SOLR 1.3 on a index with approximately 8 million
documents.  I would like to disable SOLR's cache so that it is easier
for me to test the scenario when there is a small likelihood of cache
hits.  I've disabled caching by commenting out the filterCache,
queryResultCache, and documentCache section in solrconfig.xml as
suggested by the Wiki.  It seems disabled because the admin interface
no longer shows any entries in the Cache section.

However, it appears that there is still some sort caching taking
place.  The first time I make specific query it would take around 100
msec, subsequent queries would take around 15 msec.  Is there some
sort of caching happening at Lucene level?

Thanks for your help,
Sammy Yu


RE: Solr has limit to number of returned results?

2008-10-10 Thread Lance Norskog

To select all, do star-colon-star *:*
To select a negative clause do   *:* AND -clause
To select a wildcard, h* and h?* work fine.  
Star as the only character, or star or ? as the first character are not
allowed.

These blow up with too many clauses: H*? and H*H and H*H*. And when they
don't blow up (Solr 1.3) they do not return any results when they should.

Lance

-Original Message-
From: Choi, David [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 10, 2008 2:41 PM
To: solr-user@lucene.apache.org
Subject: Solr has limit to number of returned results?

Hi everyone, I have a (hopefully) basic question..

Does solr have a max. limit on the number of returned results?
I get the following error: HTTP Status 500 - maxClauseCount is set to 1024
org.apache.lucene.search.BooleanQuery$TooManyClauses when I do a query that
essentially amounts to asking for q=*

thanks
- David Choi




Re: scoring individual values in a multivalued field

2008-10-10 Thread abhishek007

: unfortunately not possible.  lengthNorm is part of fieldNorm and for each 
: doc there is one fieldNorm per field name...

Thanks for the reply Chris, this solves part of my problem. I have explained
my problem in much more detail in a separate thread (as it would have been
out of context here). 

http://www.nabble.com/Querying-multivalued-field---can-scoring-formula-consider-only-matched-values--tt19865873.html#a19865873






-- 
View this message in context: 
http://www.nabble.com/scoring-individual-values-in-a-multivalued-field-tp19212800p19928740.html
Sent from the Solr - User mailing list archive at Nabble.com.